过滤重复记录

用户391986

我正在尝试找出person_idaccount_id另一个人相同的独特列表

需要说明的是person_id,最后一次给出重叠的人account_id不应包括在该列表中。

        id          person_id     account_id
   +------------------------------------------+
   |            |             |               |
   |     1      |     1       |      10       |
   +------------------------------------------+
   |            |             |               |
   |     2      |     2       |      10       |
   +------------------------------------------+
   |            |             |               |
   |     3      |     3       |      11       |
   +------------------------------------------+

请注意:这是一个稍微简化的示例,不要从字面上理解。


这是我目前有的查询

SELECT STRING_AGG(person_id, ',')
FROM accounts_map
WHERE created_at > '2001-01-10' -- ignore records smaller than 2001-01-10
GROUP BY account_id -- group by account id
HAVING count(*) > 1 -- any account that have multiple matches

我不知道的是

  1. 如何为每个account_id匹配项重叠,排除最新记录
  2. 如何忽略具有account_id标记为null的记录,以某种方式<> NULL不起作用
CREATE TABLE accounts_map
    ([id] int, [person_id] int, [account_id] int, [created_at] DATETIME)
;

INSERT INTO accounts_map
    ([id], [person_id], [account_id], [created_at])
VALUES
    (1, 1, 10, '2001-01-10'),
    (2, 1, 11, '2001-01-10'),
    (2, 1, 12, '2001-01-10'),

    (3, 2, 20, '2019-01-11'),
    (4, 2, 21, '2019-01-11'),
    (5, 2, 22, '2019-01-11'),

    (3, 3, 30, '2019-01-12'),
    (4, 3, 31, '2019-01-12'),
    (5, 3, 32, '2019-01-12'),

    -- same account ids as records with id (1,2,3) but inserted in different order
    (6, 4, 12, '2019-01-13'),
    (7, 4, 11, '2019-01-13'),
    (8, 4, 10, '2019-01-13'),

    -- same account ids as records with id (3,4,5)
    (9, 5, 20, '2019-01-14'),
    (10, 5, 21, '2019-01-14'),
    (11, 5, 22, '2019-01-14'),

    -- same account ids as records with id (9,10,11)
    (12, 6, 20, '2019-01-15'),
    (13, 6, 21, '2019-01-15'),
    (14, 6, 22, '2019-01-15')
;

预期产量

id
---
1 (not expected if date range cutoff specified) created_at > '2001-01-10'
2
4 (not expected if date range cutoff specified) created_at > '2001-01-10'
5

如果其帐户重叠,则被授予该帐户的最新用户将成为破坏旧帐户的有效用户。

http://sqlfiddle.com/#!18/f53e5/1

专线小巴

在子查询中,您可以在共享相同记录的记录组中进行窗口计数,account_id并用于row_number()按日期对它们进行排名。然后,外部查询对窗口计数大于1的记录进行过滤,并返回除最新记录以外的所有记录:

select person_id, account_id, created_at
from (
    select 
        t.*, 
        row_number() over(partition by account_id order by created_at desc) rn,
        count(*) over(partition by account_id) cnt
    from accounts_map t
) t
where cnt > 1 and rn > 1

DB Fiddle上的演示

person_id | account_id | created_at          
--------:| ---------:| :------------------ 
        1 | 10 | 10/01/2001 00:00:00 
        1 | 11 | 10/01/2001 00:00:00 
        1 | 12 | 10/01/2001 00:00:00 
        5 | 20 | 14/01/2019 00:00:00 
        2 | 20 | 11/01/2019 00:00:00 
        5 | 21 | 14/01/2019 00:00:00 
        2 | 21 | 11/01/2019 00:00:00 
        5 | 22 | 14/01/2019 00:00:00 
        2 | 22 | 11/01/2019 00:00:00

注意:您提供了示例数据,但不幸的是没有提供相关的预期结果,以供我们验证查询的输出。

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章