如何转置以下PySpark数据帧?
以下是pyspark数据框。
+----+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------------+--------+------+
|srab|srsbtp|avgm1|avgm2|avgm3|avgm4|avgm4|avgm6|avgm7|avgm8|avgm9| avgm10| avgm11|avgm12|
+----+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------------+--------+------+
|2389| D| null| null| null| null| null| null| null| null| null| null| null| null|
|2389| C| null| null| null| null| null| null| null| null| null|54674.1935483871|156820.0| null|
+----+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------------+--------+------+
我想将上面的数据框转换为下表
所需输出:
srab month D C
2389 avgm1 null null
2389 avgm2 null null
2389 avgm3 null null
2389 avgm4 null null
2389 avgm5 null null
2389 avgm6 null null
2389 avgm7 null null
2389 avgm8 null null
2389 avgm9 null null
2389 avgm10 null 54674.19355
2389 avgm11 null 156820
2389 avgm12 null null
在Spark SQL中,可以使用union all
和条件聚合来取消透视/透视:
select srab, month,
max(case when srsbtp = 'D' then avgm1 end) as d,
max(case when srsbtp = 'C' then avgm1 end) as c
from (
select srab, srsbtp, 'avgm1' as month, avgm1 from mytable
union all srab, srsbtp, 'avgm2', avgm2 from mytable
union all srab, srsbtp, 'avgm3', avgm3 from mytable
...
) t
gorup by srab, month
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句