我将树数据存储在Elasticsearch 7.9中,数据结构概述如下。我正在尝试编写一个查询,该查询可以提供在其下最多的孩子的前10个孩子。
给定这个示例树:
由ES中的以下数据描述:
{ "id": "A", "name": "User A" }
{ "id": "B", "name": "User B", "parents": ["A"], "parent1": "A" }
{ "id": "C", "name": "User C", "parents": ["A"], "parent1": "A" }
{ "id": "D", "name": "User D", "parents": ["A", "B"], "parent1": "B", "parent2": "A" }
{ "id": "E", "name": "User E", "parents": ["A", "B", "D"], "parent1": "D", "parent2": "B", "parent2": "A" }
每个字段都是映射类型 keyword
文档字段是:
我想找到用户A的所有“父母”和count
孩子总数。因此,在此示例情况下,结果将是
User B - 2
User C - 0
PUT test_index
PUT test_index/_mapping
{
"properties": {
"id": { "type": "keyword" },
"name": { "type": "keyword" },
"referred_by_sub": { "type": "keyword" },
"parents": { "type": "keyword" },
"parent1": { "type": "keyword" },
"parent2": { "type": "keyword" },
"parent3": { "type": "keyword" },
"parent4": { "type": "keyword" },
"parent5": { "type": "keyword" }
}
}
POST _bulk
{ "index" : { "_index" : "test_index", "_id" : "A" } }
{ "id": "A", "name": "User A" }
{ "index" : { "_index" : "test_index", "_id" : "B" } }
{ "id": "B", "name": "User B", "parents": ["A"], "parent1": "A" }
{ "index" : { "_index" : "test_index", "_id" : "C" } }
{ "id": "C", "name": "User C", "parents": ["A"], "parent1": "A" }
{ "index" : { "_index" : "test_index", "_id" : "D" } }
{ "id": "D", "name": "User D", "parents": ["A", "B"], "parent1": "B", "parent2": "A" }
{ "index" : { "_index" : "test_index", "_id" : "E" } }
{ "id": "E", "name": "User E", "parents": ["A", "B", "D"], "parent1": "D", "parent2": "B", "parent2": "A" }
如果将来有人来这里,我希望将最终结果与公认的答案有所不同。我的包括生成的文档源以及数组。这些都不在要求中,因为我试图使我的问题尽可能简单。
也许将来会对某人有所帮助。
询问
GET test_index/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"term": {
"id": "A"
}
},
{
"term": {
"parents": "A"
}
}
]
}
},
"aggs": {
"children_counter": {
"scripted_metric": {
"init_script": "state.ids_vs_children = [:]; state.root_children = [:]",
"map_script": """
def current_id = doc['id'].value;
if (!state.ids_vs_children.containsKey(current_id)) {
state.ids_vs_children[current_id] = new ArrayList();
}
if(doc['parent1'].contains(params.id)) {
state.root_children[current_id] = params._source;
}
def parents = doc['parents'];
if (parents.size() > 0) {
for (def p : parents) {
if (!state.ids_vs_children[current_id].contains(p)) {
if (!state.ids_vs_children.containsKey(p)) {
state.ids_vs_children[p] = new ArrayList();
}
state.ids_vs_children[p].add(current_id);
}
}
}
""",
"combine_script": """
def results = [];
for (def pair : state.ids_vs_children.entrySet()) {
def uid = pair.getKey();
if (!state.root_children.containsKey(uid)) {
continue;
}
def doc_map = [:];
doc_map["doc"] = state.root_children[uid];
doc_map["num_children"] = pair.getValue().size();
results.add(doc_map);
}
def final_result = [:];
final_result['count'] = results.length;
final_result['results'] = results;
return final_result;
""",
"reduce_script": "return states",
"params": {
"id": "A"
}
}
}
}
}
输出量
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"children_counter" : {
"value" : [
{
"count" : 2,
"results" : [
{
"num_children" : 1,
"doc" : {
"parent1" : "A",
"name" : "User B",
"id" : "B",
"parents" : [
"A"
]
}
},
{
"num_children" : 0,
"doc" : {
"parent1" : "A",
"name" : "User C",
"id" : "C",
"parents" : [
"A"
]
}
}
]
}
]
}
}
}
您的非规范化树已经包含了该计算所需的所有内容,但是在遍历子级并跟踪引用时,我们需要访问其他文档的父级,这是脚本化指标聚合的理想用例。
GET test_index/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"term": {
"id": "A"
}
},
{
"term": {
"parents": "A"
}
}
]
}
},
"aggs": {
"children_counter": {
"scripted_metric": {
"init_script": "state.ids_vs_children = [:];",
"map_script": """
def current_id = doc['id'].value;
if (!state.ids_vs_children.containsKey(current_id)) {
state.ids_vs_children[current_id] = new ArrayList();
}
def parents = doc['parents'];
if (parents.size() > 0) {
for (def p : parents) {
if (!state.ids_vs_children[current_id].contains(p)) {
state.ids_vs_children[p].add(current_id);
}
}
}
""",
"combine_script": """
def final_map = [:];
for (def pair : state.ids_vs_children.entrySet()) {
def uid = pair.getKey();
if (params.exclude_users != null && params.exclude_users.contains(uid)) {
continue;
}
final_map[uid] = pair.getValue().size();
}
return final_map;
""",
"reduce_script": "return states",
"params": {
"exclude_users": ["A"]
}
}
}
}
}
屈服
...
"aggregations" : {
"children_counter" : {
"value" : [
{
"B" : 2, <--
"C" : 0, <--
"D" : 1,
"E" : 0
}
]
}
}
强烈建议您进行顶层查询,以免浪费CPU资源,因为众所周知,b / c脚本会占用大量资源。需要顶层查询以将其限制为仅A的子代。
提示:如果您不太经常更新这些用户,建议您在建立索引之前执行此子级计算-您必须迭代某个位置,所以为什么不在ES之外?
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句