如何使用SQL从组中忽略相同的连续记录

sbrbot 发表于 Dev

短跑

我有以下数据：

ItemID | ProdID | ItemTypeID
=======+========+===========
1001   | 100    | A
1002   | 100    | B
1003   | 100    | A
1004   | 100    | B
1005   | 100    | B  <- successive itemtype (should be excluded)
1006   | 100    | C
1007   | 200    | C
1008   | 200    | A

我想列出每个组中的ItemTypeID（LISTAGG），但没有相同的连续ID。所以我的结果应该是这样的：

ProdID | ItemTypes
=======+==========
100    | A,B,A,B,C <- not A,B,A,B,B,C (successive B)
200    | C,A

戈登·利诺夫

这很棘手。Listagg()甚至不允许distinct，因此所有工作都需要在子查询中完成。

您可以ItemTypeId使用行号差异法来识别连续的。然后，一旦确定了组，就可以按组进行汇总，然后执行以下操作listagg()：

select ProdId,
       listagg(ItemId, ',') within group (order by seqnum) as items
from (select ProdId, ItemId, count(*) as NumItems,
             row_number() over (partition by ProdId order by min(ItemId)) as seqnum
      from (select t.*,
                   (row_number() over (partition by ProdId order by ItemId) -
                    row_number() over (partition by ProdId, ItemTypeId order by ItemId)
                   ) as grp
            from t
           ) t
      group by ProdId, ItemTypeId, grp
     ) t
group by ProdId;

编辑：

解决此问题的另一种方法lag()是查找新组的起点。前一种方法可让您获得计数。此方法可能更容易理解：

select ProdId,
       listagg(ItemId, ',') within group (order by ItemId) as items
from (select t.*
      from (select t.*,
                   lag(ItemTypeId) over (partition by ProdId order by ItemId) as prev_ItemTypeId
            from t
           ) t
      where prev_ItemTypeId is null or prev_ItemTypeId <> ItemTypeId
     ) t
group by ProdId;

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。