How to sum within a group of values and then take the difference from another group?

MEhsan Published at Dev

MEhsan

Let say I have this simplified dataframe with three variables:

ID    sample  test_result
P1    Normal           9
P1    Normal           18
P2    Normal           7
P2    Normal           16
P3    Normal           2
P3    Normal           11
P1     Tumor           6
P1     Tumor           15
P2     Tumor           5
P2     Tumor           15
P3     Tumor           3
P3     Tumor           12

I want to know how to sum the test_result values for each identical ID in each sample type (i.e. Normal, Tumor). Then I want to then take the difference between the summed normal and tumor test_result values.

I have tried using groupby on sample column and then use the diff() method on test_result column but that did not work. I guess I need to know how to do apply the .sum() first, but not sure how.

Here is what I have tried:

df.groupby('sample')['test_result'].diff()

The output I am expecting is like:

ID   test_result
P1             6 # (the sum of P1 Normal = 27) - (the sum of P1 Tumor = 21)  
P2             3
P3            -2

Any idea how to tackle this?

jezrael

Use groupby with sum and reshape by unstack:

df = df.groupby(['ID','sample'])['test_result'].sum().unstack()

Or pivot_table:

df = df.pivot_table(index='ID',columns='sample', values='test_result', aggfunc='sum')

and then subtract columns:

df['new'] = df['Normal'] - df['Tumor']
print (df)
sample  Normal  Tumor  new
ID                        
P1          27     21    6
P2          23     20    3
P3          13     15   -2

Collected from the Internet

Please contact [email protected] to delete if infringement.