Redshift：获取行的排名，按条件过滤

debugcn 发表于 Dev

塞伦西亚

每次我向表中添加一行时，我都想知道与该点相比，该行在表中的排名。使用RANK（）窗口函数可以轻松完成此操作。但是，我一直在努力寻找一种方法来发现它与表相比的排名，直到该点被value过滤为止。

举例来说，我想结束这个高度伪造的表：

    date    |   name  | animal_bought | num_sloths_bought_before | num_camels_bought_before
------------+---------+---------------+--------------------------+--------------------------
 2014-09-01 | Vincent | sloth         | 0                        | 0
 2014-09-01 | Luis    | camel         | 0                        | 0
 2014-09-02 | Vincent | sloth         | 1                        | 0
 2014-09-02 | Luis    | camel         | 0                        | 1
 2014-09-02 | Kevin   | sloth         | 0                        | 0
 2014-09-03 | Vincent | camel         | 1                        | 0
 2014-09-04 | Deo     | camel         | 0                        | 0
 2014-09-04 | Vincent | sloth         | 2                        | 1
 2014-09-05 | Luis    | camel         | 0                        | 2
 2014-09-05 | Andrew  | sloth         | 0                        | 0

我最初是在寻找是否可以对窗口函数（例如RANK() OVER(PARTITION BY name WHERE animal_bought = 'sloth' ORDER BY date ASC) AS num_sloths_bought_before）应用过滤器，但这在语法上是不正确的。然后，我尝试添加一个子查询，如下所示：

SELECT
  date,
  name,
  animal_bought,
  ( SELECT
      RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
    FROM this_table
    WHERE animal_bought = 'sloth'
  ) AS num_sloths_bought_before
FROM source_table

但是Redshift抛出了这个错误：

ERROR:  This type of correlated subquery pattern is not supported yet

我还尝试过将window函数放在case语句中（引发相同的错误），并在联接查询中计算排名（无法使其工作）。

戈登·利诺夫（Gordon Linoff）

嗯我认为此查询无论如何都不会满足您的要求：

SELECT date, name, animal_bought,
       (SELECT RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
        FROM this_table
        WHERE animal_bought = 'sloth'
       ) AS num_sloths_bought_before
FROM source_table

有几个原因：

使用的rank()提示表明this_table该匹配项中不止一行animal_bought。否则，您可以使用聚合功能。
如果只有一行与该where子句匹配，则该值始终为1，因为该where子句在之前处理rank()。
您的问题仅提及一个表，但您的查询有两个表

也许您只想要rank()没有子查询？

SELECT date, name, animal_bought,
       RANK() OVER (PARTITION BY name, animal ORDER BY date ASC) - 1 as NumberBoughtBefore
FROM source_table;

如果您想同时使用这两种动物，请不要使用rank()，请使用累积总和：

SELECT date, name, animal_bought,
       sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) as SlothsBefore,
       sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) as CamelsBefore
FROM source_table;

编辑：

SELECT date, name, animal_bought,
       (sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) -
        (case when animal = 'sloth' then 1 else 0 end)
       ) as SlothsBefore,
       (sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) -
        (case when animal = 'camel' then 1 else 0 end)
       ) as CamelsBefore
FROM source_table;

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-06-8

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章