我无法在线找到解决方案,所以我想知道是否有人有任何想法。
我有3列,一个数值('value)和2个分类列('fruits'和'country')。我想为每种分类列组合计算平均值和数字列的总和。
这是我可以做的:
df = pd.DataFrame({'value':[34, 65, 65, 34, 53, 23, 54, 23, 65, 43, 23, 34, 54],'value1':[54, 45, 23, 43, 66, 12, 87, 34, 54, 23, 65, 34, 76], 'fruit': ['Plum', 'Apple', 'Orange', 'Pear', 'Grape', 'Apple', 'Orange', 'Banana', 'Orange', 'Grape', 'Apple', 'Banana', 'Tomato'], 'country': ['USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'CANADA', 'CANADA', 'CANADA', 'USA', 'USA', 'UK', 'USA']})[['value', 'fruit', 'country']]
# print(df)
print(df.groupby(['fruit']).mean()) #sum,etc work great.
value
fruit
Apple 37.000000
Banana 28.500000
Grape 48.000000
Orange 61.333333
Pear 34.000000
Plum 34.000000
Tomato 54.000000
它适用于一列,但我想做的是:
我对如何在groupby中合并多列并进行预突变感到困惑。
这是我失败的尝试-有关如何执行此操作或让我知道什么命令可以执行此操作的任何建议,我将进行研究。
from itertools import permutations, combinations
perms = [perm for perm in permutations(df[['fruit', 'country']])]
print(perms) #doesn't work, only results in permutations of words fruit/country so 2 sets.
# [('fruit', 'country'), ('country', 'fruit')]
#trying to create combinations using combinations in itertools
g = df.groupby('fruit').apply(lambda x:
pd.Series(list(combinations(x.country, 2))))
df = pd.DataFrame(g.apply(list).tolist(), columns=['country', 'fruit'])
df
#not what I want either this is a combination I was looking for.
# country fruit
# 0 USA UK
# 1 USA USA
# 2 UK USA
# 3 CANADA UK
# 4 UK USA
# 5 USA CANADA
# 6 USA CANADA
# 7 CANADA CANADA
你要这个:
案例1:两个GROUPBY一列,平均value
和value1
:
In [1018]: df.groupby('fruit').agg({'value': 'mean', 'value1': 'mean'}).reset_index()
Out[1018]:
fruit value value1
0 Apple 37.000000 40.666667
1 Banana 28.500000 34.000000
2 Grape 48.000000 44.500000
3 Orange 61.333333 54.666667
4 Pear 34.000000 43.000000
5 Plum 34.000000 54.000000
6 Tomato 54.000000 76.000000
情况2:对多列进行分组并进行计算:
In [1017]: df.groupby(['fruit','country']).agg({'value': 'mean', 'value1': 'mean'}).reset_index()
Out[1017]:
fruit country value value1
0 Apple UK 23.0 12.0
1 Apple USA 44.0 55.0
2 Banana CANADA 23.0 34.0
3 Banana UK 34.0 34.0
4 Grape UK 53.0 66.0
5 Grape USA 43.0 23.0
6 Orange CANADA 59.5 70.5
7 Orange USA 65.0 23.0
8 Pear UK 34.0 43.0
9 Plum USA 34.0 54.0
10 Tomato USA 54.0 76.0
根据OP的评论进行更新:
In [1039]: res = df.groupby(['fruit','country']).agg({'value': 'mean', 'value1': 'mean'}).reset_index()
In [1041]: res['total'] = res.value + res.value1
In [1042]: res
Out[1042]:
fruit country value value1 total
0 Apple UK 23.0 12.0 35.0
1 Apple USA 44.0 55.0 99.0
2 Banana CANADA 23.0 34.0 57.0
3 Banana UK 34.0 34.0 68.0
4 Grape UK 53.0 66.0 119.0
5 Grape USA 43.0 23.0 66.0
6 Orange CANADA 59.5 70.5 130.0
7 Orange USA 65.0 23.0 88.0
8 Pear UK 34.0 43.0 77.0
9 Plum USA 34.0 54.0 88.0
10 Tomato USA 54.0 76.0 130.0
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句