想象一下,我有三组数据(SetA,SetB,SetC)和三个客户。我的第一个客户可以访问SetA和SetB,第二个客户可以访问SetA和SetC,第三个客户可以使用SetB和SetC。我可以为每个客户创建一个Elasticsearch索引,因此每个索引将具有以下数据集...
索引1索引2索引3 ------ ------ ------ SetA SetA SetB SetB SetC SetC
然后,我根据客户查询正确的索引。这很简单,但确实涉及数据的重复。
取而代之的是,我可以创建包含所有三组数据的单个索引。
索引 ----- SetA SetB SetC
然后,我将在查询中添加过滤,以便只考虑来自正确集合的记录作为结果。这会起作用,但是我担心这个单索引解决方案不会为查询提供与多索引方法相同的结果。
我认为,但很高兴能够纠正错误的情况,当涉及内部评分(例如相关性和频率)时,索引将考虑到索引中的所有记录。因此,带有过滤的单索引将不会获得与多索引方法相同的结果。这个假设正确吗?
如果您首先是根据客户ID过滤结果,然后仅进行搜索,则不会对相关性造成影响,您可以并且应该将这些数据合并到Elasticsearch中,而不是为此目的创建3个不同的索引。
让我通过一个小例子向您展示:
{
"mappings": {
"properties": {
"setA": {
"type": "text"
},
"setB": {
"type": "text"
},
"setC": {
"type": "text"
},
"customer-id": {
"type": "long"
}
}
}
}
{
"setA" : "first customer",
"setB" : "first customer",
"setC" : "",
"customer-id" : 1
}
{
"setA" : "first customer set A",
"setB" : "first customer set B",
"setC" : "",
"customer-id" : 1
}
{
"setA" : "second customer",
"setC" : "second customer",
"customer-id" : 2
}
{
"setA" : "second customer set A",
"setC" : "second customer set C",
"customer-id" : 2
}
{
"setB" : "third customer",
"setC" : "third customer",
"customer-id" : 3
}
{
"setB" : "third customer set A",
"setC" : "third customer set C",
"customer-id" : 3
}
{
"query": {
"bool": {
"must": [ --> this would match and order according to relevance score
{
"match": {
"setA": "first"
}
}
],
"filter": [ --> this is used for filtering all docs for cust-1
{
"term": {
"customer-id": 1
}
}
]
}
}
}
"hits": [
{
"_index": "so_query_filter",
"_type": "_doc",
"_id": "1",
"_score": 0.8025915, --> relevance is high
"_source": {
"setA": "first customer",
"setB": "first customer",
"setC": "",
"customer-id": 1 --> only cust-1 doc
}
},
{
"_index": "so_query_filter",
"_type": "_doc",
"_id": "2",
"_score": 0.60996956, -> relavance is low as more words than first
"_source": {
"setA": "first customer set A",
"setB": "first customer set B",
"setC": "",
"customer-id": 1 --> only cust-1 doc
}
}
]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句