Solr-数据中的Edgengram和破折号

debugcn 发表于 Dev

更多的

我有一个架构，我希望将两个字段设为edgengram，以便执行“ starts-with”搜索。

其中一个字段中的数据仅包含数字并且可以工作（查询2仅给出以开头的数字2）。另一方面，另一个领域效果不佳。

该字段保存类型的数据FLB-PRO，FLB-GJE，NKF-KFE等当我搜索这个领域FLB-PRO，例如，我也得到命中FLB-GJE，这实在不是我所期望做一个“开始，以” -搜索时。搜索PRO使我更接近想要的东西，仅FLB-PRO包含在结果中。

由于这两个字段使用相同的类型，因此我认为它与数据中的破折号有关，但是在避免该问题方面我非常空白。

我的Edgengram字段定义：

<!-- Similar to text_general, but does edgengram filtering (~"startswith") -->
<fieldType name="text_general_edgengram" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="40"/>

    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
  </analyzer>
</fieldType>

编辑：更多修改，似乎查询中的破折号在查询时被视为空白。我曾尝试-在查询中转义，但无法正常工作。

更多的

原来我必须更改令牌生成器，StandardTokenizer将破折号视为空白，因此包含两个由破折号分隔的字母序列的查询将被视为两个单词。

通过改变固定tokenizer class两个query和index到solr.KeywordTokenizerFactory。不管破折号如何，这都将整个字符串视为一个单词。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。