我在 Hive 中有两个表:empSrc
和empTrg
:
> select * from empSrc;
+---------------+--------------+-------------+--------------+--+
| empsrc.empid | empsrc.dept | empsrc.ph | empsrc.role |
+---------------+--------------+-------------+--------------+--+
| e1 | dev | 9999911111 | SE |
| e2 | admin | 6677889933 | SE |
+---------------+--------------+-------------+--------------+--+
2 rows selected (0.872 seconds)
> select * from empTrg;
+---------------+--------------+-------------+--------------+--------------------+----------------+--+
| emptrg.empid | emptrg.dept | emptrg.ph | emptrg.role | emptrg.dml_action | emptrg.active |
+---------------+--------------+-------------+--------------+--------------------+----------------+--+
| e1 | dev | 9999911111 | SE | I | A |
+---------------+--------------+-------------+--------------+--------------------+----------------+--+
我想找到是在记录empSrc
,但缺少的empTrg
。
我的查询工作正常:
select S.* from empSrc S
where S.empid not in (select T.empid from empTrg T);
+----------+---------+-------------+---------+--+
| s.empid | s.dept | s.ph | s.role |
+----------+---------+-------------+---------+--+
| e2 | admin | 6677889933 | SE |
+----------+---------+-------------+---------+--+
问题是此查询正在生成交叉产品。我可以使用
任何等效的LEFT OUTER JOIN
查询吗?
LEFT/RIGHT OUTER JOIN 对性能有帮助吗?
上面的场景是一个演示场景,在实际数据中我有~1200万条记录。
该查询select S.* from empSrc S where S.empid not in (select T.empid from empTrg T)
实际上并不执行交叉联接。没有问题。
可以复制相同的逻辑 not exists
select s.*
from empSrc s
where not exists (select 1 from empTrg t where t.empid = s.empid)
或left join
.
select s.*
from empSrc s
left join empTrg t on t.empid = s.empid
where t.empid is null --condition to check for non existent records
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句