Suppose i have a table in hive like so:
|Id|Data |Data2 |Groupkey|
|1 |One | |Group1 |
|2 |Two |Stuff |Group1 |
|3 |Shoes|Some |Group2 |
|4 |four |Stuff |Group2 |
|5 |Three|Notme |Group3 |
For each group that contains 'Stuff' in Data2
i want to get the row that has Groupkey
and Data
from the row other than the Stuff
, and Data2
from 'Stuff' row.
So resulting data set would look something like
|Group |Data |Data2|
|Group1|One |Two |
|Group2|Shoes|four |
I was hoping to get something going with a GROUP BY
, i started going with
SELECT Data, Groupkey FROM (SELECT Data, GroupKey FROM MyTable GROUP BY Groupkey) WHERE Data2 <> 'Stuff'
but this fails suggesting i need to include Data in the group by but that is not what i want to group by?
And i'm not sure how to select just the groups containing one row with certain data.
select Groupkey as `Group`
,min (case when Data2 <> 'Stuff' then Data end) as Data
,min (case when Data2 = 'Stuff' then Data end) as Data2
from MyTable
group by Groupkey
having count (case when Data2 = 'Stuff' then 1 end) > 0
;
+--------+-------+-------+
| group | data | data2 |
+--------+-------+-------+
| Group1 | One | Two |
| Group2 | Shoes | four |
+--------+-------+-------+
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments