Oracle将基于2列最大值的重复数据删除行

debugcn 发表于 Dev

用户名

想知道是否有人知道一种有效的方法，该方法使用基于2个属性的最大值的Oracle SQL对大型数据集中的记录进行重复数据删除。

在下面的假设示例中，我希望通过首先选择最大的transactionid来删除所有重复的COMPANYID / CHILD ID对。如果有效负载ID仍然有重复项，则为最大BATCHID。

注意：transactionID和batchID可能具有空值（应该是最低值）

表：交易

<p> CompanyID| ChildID | transactionid| BatchID | Product Details </P>
<p> ABC         EFG       306                    Product1 </p>
<p>ABC         EFG       306          54        Product2</p>
<p>ZXY         BFG       405          003       Product1</p>
<p>ZXY         BFG       405          004       Product2</p>
<p>ZXY         BFG       407                    Product3</p>

预期结果：

<p>ABC | EFG | 306 | 54 | Product 2  --selected on basis of highest transactionid and batchid </P>
<p>ZXY | BFG | 405 | 407 | Product 3 --selected on basis of highest transactionid </p>

我的设想很简单：1）在transactionid上使用max函数，然后对结果进行子查询以使batchID达到max2。2）将“重复数据删除”集自动加入到原始集中以获取产品信息

有人知道实现这一目标的更有效/更清洁的方法以及更好地处理null的方法吗？

感谢任何反馈。

伊曼纽尔

在Oracle 11g中，您可以使用这种请求：

with w(CompanyID, ChildID, transactionid, BatchID, Product_Details) as
(
  select 'ABC', 'EFG', 306, null, 'Product1 ' from dual
  union all
  select 'ABC', 'EFG', 306, 54, 'Product2' from dual
  union all
  select 'ZXY', 'BFG', 405, 003, 'Product1' from dual
  union all
  select 'ZXY', 'BFG', 405, 004, 'Product2' from dual
  union all
  select 'ZXY', 'BFG', 407, null, 'Product3' from dual
)
select w.CompanyID,
       w.ChildID,
       max(w.transactionid)   keep (dense_rank last order by nvl(w.transactionid, 0), nvl(w.batchid, 0)) max_transactionid,
       max(w.batchid)         keep (dense_rank last order by nvl(w.transactionid, 0), nvl(w.batchid, 0)) max_batchid,
       max(w.Product_Details) keep (dense_rank last order by nvl(w.transactionid, 0), nvl(w.batchid, 0)) max_Product_Details
from w
group by w.CompanyID, w.ChildID
;

该nvl功能允许您处理空情况。这是输出（不适合您的输出，但由于我了解您的要求，所以我做了请求）：

COMPANYID    CHILDID    MAX_TRANSACTIONID    MAX_BATCHID    MAX_PRODUCT_DETAILS
ABC          EFG        306                  54             Product2
ZXY          BFG        407                                 Product3

编辑：让我试图进一步解释DENSE_RANK和LAST：内GROUP BY，这句法表现为聚合函数（如SUM，AVG ......）。

在一个组中，ORDER BY给出排序（此处为transactionid和batchid）
然后DENSE_RANK LAST指出您将关注此排序的最后排行（实际上，几排具有相同的排行）
在MAX采取这些排名靠前的行内的最高值。在大多数情况下，您只有一行，因此MAX可能看起来毫无用处，但事实并非如此。因此，您经常会看到MIN和DENSE_RANK FIRST或或MAX和DENSE_RANK LAST。