pandas dataframe: subset by column + groupby another column

Lisa

I'm new to pandas dataframes and would appreciate help with the following problem (similar to this). I have the following data:

data = {'Cat1': [2,1,2,1,2,1,2,1,1,1,2],
        'Cat2': [0,0,0,0,0,0,1,1,1,1,1],
        'values': [1,2,3,1,2,3,1,2,3,5,1]}
my_data = DataFrame(data)

I would like to perform a ttest_ind for every category in Cat2 to distinguish between categories in Cat1.

The way I see it, I could separate the data into

cat1_1 = my_data[my_data['Cat1']==1]
cat1_2 = my_data[my_data['Cat1']==2]

And then loop through every value in Cat2 to perform a t-test:

for cat2 in [0,1]:

    subset_1 = cat1_1[cat1_1['Cat2']==cat2]
    subset_2 = cat1_2[cat1_2['Cat2']==cat2]

    t, p = ttest_ind(subset_1['values'], subset_2['values'])

But this seems really convoluted. Could there be a simpler solution, maybe with groupby? Thanks a lot!

jezrael

IIUC you can try groupby by column Cat2 and apply function f:

import pandas as pd
from scipy.stats import ttest_ind

data = {'Cat1': [2,1,2,1,2,1,2,1,1,1,2],
        'Cat2': [0,0,0,0,0,0,1,1,1,1,1],
        'values': [1,2,3,1,2,3,1,2,3,5,1]}
my_data =pd.DataFrame(data)
print my_data
    Cat1  Cat2  values
0      2     0       1
1      1     0       2
2      2     0       3
3      1     0       1
4      2     0       2
5      1     0       3
6      2     1       1
7      1     1       2
8      1     1       3
9      1     1       5
10     2     1       1

def f(x):
    #print x   
    cat1_1 = x[x['Cat1']==1]
    cat1_2 = x[x['Cat1']==2]

    t, p = ttest_ind(cat1_1['values'], cat1_2['values'])
    return pd.Series({'a':t, 'b':p})     

print my_data.groupby('Cat2').apply(f) 
            a         b
Cat2                   
0     0.00000  1.000000
1     2.04939  0.132842  

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

pandas add column to groupby dataframe

From Dev

Pandas dataframe groupby remove column

From Dev

Adding new column thats result of difference in consecutive rows in pandas dataframe groupby subset

From Dev

pandas dataframe create a new column whose values are based on groupby sum on another column

From Dev

How to subset one column in pandas with condition on another column

From Dev

Pandas: multiply column by column from another dataframe?

From Dev

Reshaping a column based on another column in a pandas dataframe

From Dev

Groupby a column from another (same # of rows) dataframe

From Dev

Groupby a column from another (same # of rows) dataframe

From Java

Pandas dataframe groupby make a list or array of a column

From Dev

Pandas : Assign result of groupby to dataframe to a new column

From Dev

Pandas multiplying a dataframe column with groupby result

From Dev

Advanced groupby column creation in pandas Dataframe

From Dev

How to rename pandas dataframe column with another dataframe?

From Dev

GroupBy one column, custom operation on another column of grouped records in pandas

From Dev

How can I concatenate date from another column when I use groupby and aggregation in a pandas dataframe

From Dev

Copy a subset of a column, based on conditions, to another dataframe in R

From Dev

Copy a subset of a column, based on conditions, to another dataframe in R

From Dev

populating a dataframe column in pandas with another dataframe's column

From Dev

Pandas: subset multiple columns by name based on value in another column

From Dev

pandas dataframe: how to aggregate a subset of rows based on value of a column

From Dev

How to assign a value to a column for a subset of dataframe based on a condition in Pandas?

From Dev

Pandas TypeError when trying to count NaNs in subset of dataframe column

From Dev

pandas dataframe count uniques with respect to another column

From Java

pandas: modifying values in dataframe from another column

From Dev

how to add another paired column in pandas dataframe?

From Dev

Inserting/Adding another column level to pandas dataframe

From Dev

Get names based on another column in pandas dataframe

From Dev

Move one column to another dataframe pandas

Related Related

  1. 1

    pandas add column to groupby dataframe

  2. 2

    Pandas dataframe groupby remove column

  3. 3

    Adding new column thats result of difference in consecutive rows in pandas dataframe groupby subset

  4. 4

    pandas dataframe create a new column whose values are based on groupby sum on another column

  5. 5

    How to subset one column in pandas with condition on another column

  6. 6

    Pandas: multiply column by column from another dataframe?

  7. 7

    Reshaping a column based on another column in a pandas dataframe

  8. 8

    Groupby a column from another (same # of rows) dataframe

  9. 9

    Groupby a column from another (same # of rows) dataframe

  10. 10

    Pandas dataframe groupby make a list or array of a column

  11. 11

    Pandas : Assign result of groupby to dataframe to a new column

  12. 12

    Pandas multiplying a dataframe column with groupby result

  13. 13

    Advanced groupby column creation in pandas Dataframe

  14. 14

    How to rename pandas dataframe column with another dataframe?

  15. 15

    GroupBy one column, custom operation on another column of grouped records in pandas

  16. 16

    How can I concatenate date from another column when I use groupby and aggregation in a pandas dataframe

  17. 17

    Copy a subset of a column, based on conditions, to another dataframe in R

  18. 18

    Copy a subset of a column, based on conditions, to another dataframe in R

  19. 19

    populating a dataframe column in pandas with another dataframe's column

  20. 20

    Pandas: subset multiple columns by name based on value in another column

  21. 21

    pandas dataframe: how to aggregate a subset of rows based on value of a column

  22. 22

    How to assign a value to a column for a subset of dataframe based on a condition in Pandas?

  23. 23

    Pandas TypeError when trying to count NaNs in subset of dataframe column

  24. 24

    pandas dataframe count uniques with respect to another column

  25. 25

    pandas: modifying values in dataframe from another column

  26. 26

    how to add another paired column in pandas dataframe?

  27. 27

    Inserting/Adding another column level to pandas dataframe

  28. 28

    Get names based on another column in pandas dataframe

  29. 29

    Move one column to another dataframe pandas

HotTag

Archive