我想加入三个表来检查每个组上共享和非共享用户的数量。理想情况下,我想产生与此类似的东西。
我有一个大表,用于存储许多类的数据,包括W,D和P。要评估用户是否参加了一个以上的类,我将为每个组创建一个子查询,然后full join
合并这些结果表。我的问题是我必须在同一主键(userpid
)上连接所有三个表。我试图以这种方式编写查询,但是Hive收到一条错误消息,提示IS NULL
操作员仅接受一个参数。
select
isnull(iq1.userpid, iq2.userpid) userpid,
groupW,
groupD,
sum( case when iq2.userpid is not null then 1 else 0 end ) groupP
from
( select
isnull(q1.userpid, q2.userpid) userpid,
( case when q1.userpid is not null then 1 else 0 end ) groupW,
( case when q1.userpid is not null then 1 else 0 end ) groupD
from
-- users who attended class W
( select distinct userpid
from store.attendance
where classid = 1165
and datehour between '2014-04-28 00:00:00' and '2014-04-30 00:00:00'
) q1
full outer join
-- users who attended class D
( select distinct userpid
from store.attendance
where classid= 1174
and datehour between '2014-04-28 00:00:00' and '2014-04-30 00:00:00'
) q2
on q1.userpid = q2.userpid ) iq1
full outer join
-- users who attended class P
( select distinct userpid
from store.attendance
where classid = 1173
and datehour between '2014-04-28 00:00:00' and '2014-04-30 00:00:00'
) iq2
on iq2.userpid = iq1.userpid
;
还有另一种函数或方式可以用来实现相同目标的查询编写吗?然后,我可以按类别,使用一系列case when
调用或在R或Python中处理它,来按类查看共享的非共享用户数。
我建议使用两层聚合进行此操作。首先为每个类创建标志。然后通过这些标志进行聚合:
select has_1165, has_1174, has_1173, count(*) as cnt, min(userpid), max(userpid)
from (select userpid,
max(case when classid = 1165 then 1 else 0 end) as has_1165,
max(case when classid = 1174 then 1 else 0 end) as has_1174,
max(case when classid = 1173 then 1 else 0 end) as has_1173
from store.attendance
where classid in (1165, 1173, 1174) and
datehour between '2014-04-28 00:00:00' and '2014-04-30 00:00:00'
group by userpid
) a
group by has_1165, has_1174, has_1173;
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句