我试图使用groupby
功能的itertools
库。对于第 2 组列表,下一个代码完美运行:
from itertools import groupby
from operator import itemgetter
date = ['2019/07/25', '2019/07/25', '2019/07/27', '2019/07/28', '2019/07/28', '2019/07/28', '2019/07/28', '2019/07/28']
count1 = [1, 3, 4, 0, 2, 0, 1, 1]
count2 = [2, 1, 3, 1, 1, 1, 0, 0]
def group_data(date, count):
group = []
for k, g in groupby(zip(date, count), itemgetter(0)):
group.append((k, sum(list(list(zip(*g))[1]))))
sorted(group)
return group
print(group_data(date, count1))
[('2019/07/25', 3), ('2019/07/27', 3), ('2019/07/28', 3)]
但是如何为 3 个列表重写它?
group_data(date, count1, count2)
应该返回:
[('2019/07/25', 3, 4), ('2019/07/27', 3, 4), ('2019/07/28', 3, 4)]
换句话说,我想获得与实现pandas
函数相同的结果,groupby
但使用itertools
并获取集合列表:
df = pd.DataFrame({'date':date,'count1':count1,'count2':count2})
df.groupby('date')['count1', 'count2'].sum()
date count count2
2019/07/25 4 3
2019/07/27 4 3
2019/07/28 4 3
如果您只需要它用于 3 个列表,那么这有效:
def group_data(date, count1, count2):
group = []
for k, g in groupby(zip(date, count1, count2), itemgetter(0)):
g12 = list(zip(*g))
group.append((k, sum(list(g12[1])), sum(list(g12[2]))))
sorted(group)
return group
但我认为它可以简单得多。
如果您需要 n 个列表:
def group_data(date, *counts):
group = []
for k, g in groupby(zip(date, *counts), itemgetter(0)):
gzip = list(zip(*g))
group.append((k, *list((sum(l) for l in gzip[1:]))))
sorted(group)
return group
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句