I'm looking for a way to chain groupby and apply, like this (cf code below for a concrete example):
df.groupby("a").apply(func_1).groupby("b").apply(func_2)
I guess it doesn't work because groupby needs to take in input a dataframe, which is not always the case of the 2nd groupby above (could take in input a serie, cf example). A solution could be to have the first apply which outputs the result of func_1 plus the original dataframe, but I haven't found how to do this.
I'm looking for a general workaround, not just a workaround for this specific example.
Example: Let's say that I want to compute the area under curb of a for each group in b and then compute the sum of these areas for each group in c.
df=pd.DataFrame({"a":np.arange(8),"b":np.repeat(np.arange(4),2),
"c":np.repeat(np.arange(2),4)})
df
a b c
0 0 0 0
1 1 0 0
2 2 1 0
3 3 1 0
4 4 2 1
5 5 2 1
6 6 3 1
7 7 3 1
df.groupby("b").apply(lambda x: trapz(x["a"])).groupby("c").apply(sum)
Traceback (most recent call last):
[...]
KeyError: 'c'
#Expected output
c
0 3.0
1 11.0
#I know that this code works, but I would like to avoid to modify
#my dataframe :
df["result"]=list(df
.groupby("b").apply(lambda x: trapz(x["a"]))
.repeat(df.groupby("b").size()))
df.groupby("b").first().groupby("c").result.sum()
Any help greatly appreciated!
I think I would do something like:
# your_fun is the function you want to apply
df.groupby('c').apply(lambda f: sum(f.groupby('b')['a'].apply(your_fun))
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加