我正在尝试找出ElasticSearch在按分数对结果进行排名时所使用的逻辑。
我总共有4个索引。我正在查询所有索引中的一个词。我正在使用的查询如下-
GET /_all/static/_search
{
"query": {
"match": {
"name": "chinese"
}
}
}
我得到的(部分)响应如下:
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 40,
"successful": 40,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 2.96844,
"hits": [
{
"_shard": 1,
"_node": "Hz9L2DZ-ShSajaNvoyU8Eg",
"_index": "restaurant",
"_type": "static",
"_id": "XecLkyYNQWihuR2atFc5JQ",
"_score": 2.96844,
"_source": {
"name": "Just Chinese"
},
"_explanation": {
"value": 2.96844,
"description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:",
"details": [
{
"value": 2.96844,
"description": "fieldWeight in 1, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 4.749504,
"description": "idf(docFreq=3, maxDocs=170)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=1)"
}
]
}
]
}
},
{
"_shard": 1,
"_node": "Hz9L2DZ-ShSajaNvoyU8Eg",
"_index": "restaurant",
"_type": "static",
"_id": "IAUpkC55ReySjvl9Xr5MVw",
"_score": 2.96844,
"_source": {
"name": "The Chinese Hut"
},
"_explanation": {
"value": 2.96844,
"description": "weight(name:chinese in 5) [PerFieldSimilarity], result of:",
"details": [
{
"value": 2.96844,
"description": "fieldWeight in 5, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 4.749504,
"description": "idf(docFreq=3, maxDocs=170)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=5)"
}
]
}
]
}
},
{
"_shard": 2,
"_node": "Hz9L2DZ-ShSajaNvoyU8Eg",
"_index": "cuisine",
"_type": "static",
"_id": "6",
"_score": 2.7047482,
"_source": {
"name": "Chinese"
},
"_explanation": {
"value": 2.7047482,
"description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:",
"details": [
{
"value": 2.7047482,
"description": "fieldWeight in 1, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 2.7047482,
"description": "idf(docFreq=1, maxDocs=11)"
},
{
"value": 1,
"description": "fieldNorm(doc=1)"
}
]
}
]
}
},
我的问题是-我了解弹性搜索会以较高的分数对待较小的值,那么为什么餐厅索引中的“仅中文”和“中式小屋”之类的结果排在美食的最佳匹配“中国”之上?指数?据我所知,在将这些文档插入索引时,我没有使用任何特殊的分析器或任何东西。一切都是默认的。
我缺少什么,如何得到预期的结果?
计算分数的重要参数之一是文档反码率(IDF)。默认情况下,elasticsearch的每个碎片都会尝试根据本地IDF估计全局IDF。当您有很多相似的记录均匀分布在各个分片上时,它就可以工作。但是,当您只有很少的记录,或者将多个分片的结果与不同类型的记录(美食名称和餐馆名称)组合在一起时,估计的IDF可能会产生奇怪的结果。解决此问题的方法是使用elasticsearch的dfs_query_then_fetch搜索模式。
顺便说一句,为了了解elasticsearch如何计算分数,您可以在搜索请求中或url上使用explain参数。因此,当您询问有关计分的问题时,当您为输出提供解释为true的输出时,它会有所帮助。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句