在Elasticsearch中查询非规范化树数据

ug_

我将树数据存储在Elasticsearch 7.9中,数据结构概述如下。我正在尝试编写一个查询,该查询可以提供在其下最多的孩子的前10个孩子


设定数据

给定这个示例树:

示例树

由ES中的以下数据描述:

{ "id": "A", "name": "User A" }
{ "id": "B", "name": "User B", "parents": ["A"], "parent1": "A" }
{ "id": "C", "name": "User C", "parents": ["A"], "parent1": "A" }
{ "id": "D", "name": "User D", "parents": ["A", "B"], "parent1": "B", "parent2": "A" }
{ "id": "E", "name": "User E", "parents": ["A", "B", "D"], "parent1": "D", "parent2": "B", "parent2": "A" }

每个字段都是映射类型 keyword

文档字段是:

  • “ id”-文档ID,与_id相同,
  • “父级”-文档的所有父级;如果它是根节点,则为空
  • “ parent1”-文档的父级
  • “ parent2”-文档的祖父母
  • “父母N ”-第N个曾祖父(5岁以下)

所需结果

我想找到用户A的所有“父母”和count孩子总数因此,在此示例情况下,结果将是

User B - 2
User C - 0

自己测试

PUT test_index
PUT test_index/_mapping
{
  "properties": {
    "id": { "type": "keyword" },
    "name": { "type": "keyword" },
    "referred_by_sub": { "type": "keyword" },
    "parents": { "type": "keyword" },
    "parent1": { "type": "keyword" },
    "parent2": { "type": "keyword" },
    "parent3": { "type": "keyword" },
    "parent4": { "type": "keyword" },
    "parent5": { "type": "keyword" }
  }
}

POST _bulk
{ "index" : { "_index" : "test_index", "_id" : "A" } }
{ "id": "A", "name": "User A" }
{ "index" : { "_index" : "test_index", "_id" : "B" } }
{ "id": "B", "name": "User B", "parents": ["A"], "parent1": "A" }
{ "index" : { "_index" : "test_index", "_id" : "C" } }
{ "id": "C", "name": "User C", "parents": ["A"], "parent1": "A" }
{ "index" : { "_index" : "test_index", "_id" : "D" } }
{ "id": "D", "name": "User D", "parents": ["A", "B"], "parent1": "B", "parent2": "A" }
{ "index" : { "_index" : "test_index", "_id" : "E" } }
{ "id": "E", "name": "User E", "parents": ["A", "B", "D"], "parent1": "D", "parent2": "B", "parent2": "A" }

最终结果来自乔的答案

如果将来有人来这里,我希望将最终结果与公认的答案有所不同。我的包括生成的文档源以及数组。这些都不在要求中,因为我试图使我的问题尽可能简单。

也许将来会对某人有所帮助。

询问

GET test_index/_search
{
  "size": 0,
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "id": "A"
          }
        },
        {
          "term": {
            "parents": "A"
          }
        }
      ]
    }
  },
  "aggs": {
    "children_counter": {
      "scripted_metric": {
        "init_script": "state.ids_vs_children = [:]; state.root_children = [:]",
        "map_script": """
          def current_id = doc['id'].value;
          if (!state.ids_vs_children.containsKey(current_id)) {
            state.ids_vs_children[current_id] = new ArrayList();
          }
          
          if(doc['parent1'].contains(params.id)) {
            state.root_children[current_id] = params._source;
          }
          
          def parents = doc['parents'];
          if (parents.size() > 0) {
            for (def p : parents) {
              if (!state.ids_vs_children[current_id].contains(p)) {
                if (!state.ids_vs_children.containsKey(p)) {
                  state.ids_vs_children[p] = new ArrayList();
                }
                state.ids_vs_children[p].add(current_id);
              }
            }
          }
        """,
        "combine_script": """
          def results = [];
          for (def pair : state.ids_vs_children.entrySet()) {
            def uid = pair.getKey();
            if (!state.root_children.containsKey(uid)) {
              continue;
            }
            
            def doc_map = [:];
            doc_map["doc"] = state.root_children[uid];
            doc_map["num_children"] = pair.getValue().size();
            results.add(doc_map);
          }
      
          def final_result = [:];
          final_result['count'] = results.length;
          final_result['results'] = results;
          return final_result;
        """,
        "reduce_script": "return states",
        "params": {
          "id": "A"
        }
        
      }
    }
  }
}

输出量

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "children_counter" : {
      "value" : [
        {
          "count" : 2,
          "results" : [
            {
              "num_children" : 1,
              "doc" : {
                "parent1" : "A",
                "name" : "User B",
                "id" : "B",
                "parents" : [
                  "A"
                ]
              }
            },
            {
              "num_children" : 0,
              "doc" : {
                "parent1" : "A",
                "name" : "User C",
                "id" : "C",
                "parents" : [
                  "A"
                ]
              }
            }
          ]
        }
      ]
    }
  }
}
乔·索罗辛

您的非规范化树已经包含了该计算所需的所有内容,但是在遍历子级并跟踪引用时,我们需要访问其他文档的父级,这是脚本化指标聚合的理想用例

GET test_index/_search
{
  "size": 0,
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "id": "A"
          }
        },
        {
          "term": {
            "parents": "A"
          }
        }
      ]
    }
  },
  "aggs": {
    "children_counter": {
      "scripted_metric": {
        "init_script": "state.ids_vs_children = [:];",
        "map_script": """
          def current_id = doc['id'].value;
          if (!state.ids_vs_children.containsKey(current_id)) {
            state.ids_vs_children[current_id] = new ArrayList();
          }
          
          def parents = doc['parents'];
          if (parents.size() > 0) {
            for (def p : parents) {
              if (!state.ids_vs_children[current_id].contains(p)) {
                state.ids_vs_children[p].add(current_id);
              }
            }
          }
        """,
        "combine_script": """
          def final_map = [:];
          for (def pair : state.ids_vs_children.entrySet()) {
            def uid = pair.getKey();
            if (params.exclude_users != null && params.exclude_users.contains(uid)) {
              continue;
            }
            
            final_map[uid] = pair.getValue().size();
          }
      
          return final_map;
        """,
        "reduce_script": "return states",
        "params": {
          "exclude_users": ["A"]
        }
      }
    }
  }
}

屈服

...
"aggregations" : {
  "children_counter" : {
    "value" : [
      {
        "B" : 2,    <--
        "C" : 0,    <--
        "D" : 1,
        "E" : 0
      }
    ]
  }
}

强烈建议您进行顶层查询,以免浪费CPU资源,因为众所周知,b / c脚本会占用大量资源。需要顶层查询以将其限制为仅A的子代。

提示:如果您不太经常更新这些用户,建议您在建立索引之前执行此子级计算-您必须迭代某个位置,所以为什么不在ES之外?

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

数据非规范化

来自分类Dev

AngularFire-如何查询非规范化数据?

来自分类Dev

AngularFire-如何查询非规范化数据?

来自分类Dev

mongo中的规范化数据与非规范化数据

来自分类Dev

规范化来自非规范化表的数据

来自分类Dev

如何对R中的数据进行规范化和非规范化?

来自分类Dev

MongoDb Doctrine Symfony 2中的非规范化数据

来自分类Dev

如何在Firebase中写入非规范化数据

来自分类Dev

更新Cassandra中的非规范化数据

来自分类Dev

更新Cassandra中的非规范化数据

来自分类Dev

更新 Cassandra 中的非规范化数据

来自分类Dev

如何对 MySQL 中的数据进行非规范化?

来自分类Dev

AngularFire Loop非规范化数据

来自分类Dev

jq:数据非规范化

来自分类Dev

数据库非规范化?

来自分类Dev

数据库视图可从单个非规范化表中模拟规范化表

来自分类Dev

带有非规范化表的弹簧数据规范

来自分类Dev

elasticsearch:保留冗余(非规范化)数据还是保留ID列表以进行交叉引用?

来自分类Dev

如何在不进行非规范化的情况下优化数据库查询?

来自分类Dev

Tableau:规范化或非规范化的表在读取查询上的性能更好?

来自分类Dev

从规范化表中获取数据

来自分类Dev

从规范化表中获取数据

来自分类Dev

Java中数据集的规范化

来自分类Dev

SQL中的数据规范化(父子)

来自分类Dev

如何设置 elasticsearch-rails 非规范化映射

来自分类Dev

为什么当我使用规范化数据时在kmeans中获得嵌套集群,而当我使用非规范化数据时却得到非重叠集群?

来自分类Dev

对表进行非规范化后查询

来自分类Dev

非规范化的深度

来自分类Dev

表非规范化

Related 相关文章

热门标签

归档