我有一系列记录,其中包含一些具有时间有效性的信息(产品类型)。
如果分组信息(产品类型)保持不变,我想将相邻的有效期合并在一起。我不能GROUP BY
与MIN
和一起使用简单的方法MAX
,因为某些产品类型(A
在示例中为)可以“消失”和“返回”。
使用Oracle 11g。
MySQL的类似问题是:如何在MySQL中进行连续分组依据?
输入数据:
| PRODUCT | START_DATE | END_DATE |
|---------|----------------------------------|----------------------------------|
| A | July, 01 2013 00:00:00+0000 | July, 31 2013 00:00:00+0000 |
| A | August, 01 2013 00:00:00+0000 | August, 31 2013 00:00:00+0000 |
| A | September, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
| B | October, 01 2013 00:00:00+0000 | October, 31 2013 00:00:00+0000 |
| B | November, 01 2013 00:00:00+0000 | November, 30 2013 00:00:00+0000 |
| A | December, 01 2013 00:00:00+0000 | December, 31 2013 00:00:00+0000 |
| A | January, 01 2014 00:00:00+0000 | January, 31 2014 00:00:00+0000 |
| A | February, 01 2014 00:00:00+0000 | February, 28 2014 00:00:00+0000 |
| A | March, 01 2014 00:00:00+0000 | March, 31 2014 00:00:00+0000 |
预期结果:
| PRODUCT | START_DATE | END_DATE |
|---------|---------------------------------|----------------------------------|
| A | July, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
| B | October, 01 2013 00:00:00+0000 | November, 30 2013 00:00:00+0000 |
| A | December, 01 2013 00:00:00+0000 | March, 31 2014 00:00:00+0000 |
请参阅完整的SQL Fiddle。
这是一个孤岛问题。有多种解决方法。这使用lead
和lag
分析功能:
select distinct product,
case when start_date is null then lag(start_date)
over (partition by product order by rn) else start_date end as start_date,
case when end_date is null then lead(end_date)
over (partition by product order by rn) else end_date end as end_date
from (
select product, start_date, end_date, rn
from (
select t.product,
case when lag(end_date)
over (partition by product order by start_date) is null
or lag(end_date)
over (partition by product order by start_date) != start_date - 1
then start_date end as start_date,
case when lead(start_date)
over (partition by product order by start_date) is null
or lead(start_date)
over (partition by product order by start_date) != end_date + 1
then end_date end as end_date,
row_number() over (partition by product order by start_date) as rn
from t
)
where start_date is not null or end_date is not null
)
order by start_date, product;
PRODUCT START_DATE END_DATE
------- ---------- ---------
A 01-JUL-13 30-SEP-13
B 01-OCT-13 30-NOV-13
A 01-DEC-13 31-MAR-14
最里面的查询查看产品的前后记录,并且仅在记录不连续时才保留开始和/或结束时间:
select t.product,
case when lag(end_date)
over (partition by product order by start_date) is null
or lag(end_date)
over (partition by product order by start_date) != start_date - 1
then start_date end as start_date,
case when lead(start_date)
over (partition by product order by start_date) is null
or lead(start_date)
over (partition by product order by start_date) != end_date + 1
then end_date end as end_date
from t;
PRODUCT START_DATE END_DATE
------- ---------- ---------
A 01-JUL-13
A
A 30-SEP-13
A 01-DEC-13
A
A
A 31-MAR-14
B 01-OCT-13
B 30-NOV-13
select的下一个级别将删除那些处于中期的日期,其中两个日期都被内部查询覆盖,从而得到:
PRODUCT START_DATE END_DATE
------- ---------- ---------
A 01-JUL-13
A 30-SEP-13
A 01-DEC-13
A 31-MAR-14
B 01-OCT-13
B 30-NOV-13
然后外部查询折叠这些相邻对;我使用了创建重复项,然后使用消除重复项的简单distinct
方法,但是您可以通过其他方式进行操作,例如将两个值都放入一对行中,然后将两个值都保留为另一个空值,然后用另一个值消除它们选择层,但我认为在这里完全可以。
如果您的实际用例有时间,而不仅仅是日期,那么您需要在内部查询中调整比较;而不是+/- 1,可能是1秒的间隔,或者如果您愿意,则是1/86400,但取决于值的精度。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句