Lucene：文件存在，而无法通过使用QueryParser获取文件

debugcn 发表于 Dev

Qing

基本上，我为85k html文件建立了索引（google结果页面和关键字是不同的大学名称），并且我将每个页面的标题用作我的lucene索引中名为“ title”的字段。当我搜索“ duquesne AND university”之类的关键词时，没有结果出来，但是，当我仅将关键词更改为“ duquesne”时，我可以得到标题为：“ title：Duquesne Univeristy-Google搜索”的结果，为什么发生这种情况吗？从第二次尝试中，我可以告诉您该标题为Duquesne Univeristy的文件已被索引，但我无法从第一次尝试中获取它。多谢！〜

这是我用于建立索引的代码，我使用Jsoup从网页获取标题：

//indexDir is the directory that hosts Lucene's index files 
     File   indexDir = new File("F:\\luceneIndex"); 

     Directory myindex=SimpleFSDirectory.open(indexDir);
     //dataDir is the directory that hosts the text files that to be indexed 
     File   dataDir  = new File("I:\\luceneTextFiles"); 
     Analyzer luceneAnalyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); 
     File[] dataFiles  = dataDir.listFiles(); 
     IndexWriterConfig indexConfig=new IndexWriterConfig(Version.LUCENE_CURRENT,luceneAnalyzer);
     IndexWriter indexWriter = new IndexWriter(myindex, indexConfig); 
     long startTime = new Date().getTime(); 
     System.out.println("Total file number is  "+dataFiles.length+"");
     for(int i = 0; i < dataFiles.length; i++){ 
          if(dataFiles[i].isFile() && dataFiles[i].getName().endsWith(".txt")){
               org.jsoup.nodes.Document t=Jsoup.parse(dataFiles[i], "UTF-8");                  
               Document document = new Document(); 
               Reader txtReader = new FileReader(dataFiles[i]); 
               document.add(new Field("title",t.title(),Field.Store.YES,Field.Index.ANALYZED));
               document.add(new Field("path",dataFiles[i].getCanonicalPath(),Field.Store.YES,Field.Index.NOT_ANALYZED)); 
               document.add(new Field("count",i+"",Field.Store.YES,Field.Index.NOT_ANALYZED));
               document.add(new Field("contents",txtReader)); 
               indexWriter.addDocument(document); 

          } 
     } 

     //indexWriter.getCommitData();
     indexWriter.close(); 
     long endTime = new Date().getTime(); 

String queryKey="duquesne";
        String subqueryKey="university";
        String queryField="contents";
        String subqueryField="title";
        /*
         * 0------>normal search
         * 1------>range search
         * 2------>prefix search
         * 3------>combine search
         * 4------>phrase query
         * 5------>wild card query
         * 6------>fuzzy query
         */
        int querychoice=0;

        //initialize the directory
        File indexDir=new File("F:\\luceneIndex");
        Directory directory=SimpleFSDirectory.open(indexDir);
        IndexReader reader=IndexReader.open(directory);
        //initialize the searcher
        IndexSearcher searcher=new IndexSearcher(reader);
        Analyzer analyzer=new StandardAnalyzer(Version.LUCENE_CURRENT);
        Query query;
        switch(querychoice){

        case 0:
            QueryParser parser=new QueryParser(Version.LUCENE_CURRENT,subqueryField,analyzer);
            query=parser.parse(queryKey);
            break;

征服者

解析title:Duquesne Univeristy - Google Search使用标准分析器将导致查询title:duquesne defaultfield:univeristy defaultfield:google defaultfield:search而术语是OR连接。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-06-6

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

Lucene：文件存在，而无法通过使用QueryParser获取文件

Lucene：文件存在，而无法通过使用QueryParser获取文件

无法通过FileInputStream获取文件？

无法通过FileInputStream获取文件？

通过lucene搜索获取文件夹nodeRef

文件存在但无法通过脚本访问

无法解析符号 QueryParser Lucene 7.5

无法使用ExternalStorageDirectory获取文件

文件太大而无法导入？

无法通过WebApi上传文件以获取更大的文件

使用JQuery通过AJAX获取文件

Lucene.Net QueryParser引发IOException（通过eof读取）

如何获取我自己的Google云端硬盘文件的md5Checksum字段（通过仅使用类和对象，而无需使用HTTP请求）？

使用requirejs无法获取js文件

无法获取通过Bower安装的文件以在Eclipse中工作

烧瓶，无法通过request.files获取文件部分

无法通过Angular中的GET请求从json文件获取数据

无法通过 CloudFront CDN 从 S3 获取静态文件

由于名称中的＆而无法移动文件

无法从文件获取位图

无法从文件获取图像

有没有办法通过从远程URL下载pdf文件来合并PDF文件，而无需使用node js将其保存在我的服务器上？

Lucene QueryParser与TermQuery

如何使用Term或QueryParser从Lucene索引中删除文档

由于文件名而无法导入文件

批处理文件无法删除“正在使用”的文件，但是该文件似乎不存在？

使用sqoop导入Hive时获取文件存在错误

使用xslt获取先前存在的pdf的文件大小

BOX- API使用其余API从Box中获取共享的文件夹/文件数据，而无需使用Oauth

无法检测文件是否存在

尽管文件存在，但无法使用shutil进行复制