Aggregating rows in python pandas dataframe

melodyduany

I have a dataframe documenting when a product was added and removed from basket. However, the set_name column contains two sets of information for the color set and the shape set. See below:

   eff_date  prod_id   set_name         change_type           
0  20150414  20770     MONO COLOR SET   ADD             
1  20150414  20770     REC SHAPE SET    ADD         
2  20150429  132       MONO COLOR SET   ADD                
3  20150429  132       REC SHAPE SET    ADD        
4  20150521  199       MONO COLOR SET   DROP
5  20150521  199       REC SHAPE SET    DROP
6  20150521  199       TET SHAPE SET    ADD
7  20150521  199       MONO COLOR SET   ADD

I would like to split out the two sets of information contained in set_name into columns color_set and shape_set and drop set_name. so the previous df should look like:

   eff_date  prod_id   change_type  color_set       shape_set     
0  20150414  20770     ADD          MONO COLOR SET  REC SHAPE SET          
1  20150429  132       ADD          MONO COLOR SET  REC SHAPE SET
2  20150521  199       DROP         MONO COLOR SET  REC SHAPE SET
3  20150521  199       ADD          MONO COLOR SET  TET SHAPE SET

I attempted first splitting out the columns in a for loop and then aggregating with groupby:

for index, row in df.iterrows():
    if 'COLOR' in df.loc[index,'set_name']:
        df.loc[index,'color_set'] = df.loc[index,'set_name']
    if 'SHAPE' in df.loc[index,'set_name']:
        df.loc[index,'shape_set'] = df.loc[index,'set_name']
df = df.fillna('')
df.groupby(['eff_date','prod_id','change_type']).agg({'color_set':sum,'shape_set':sum})

However this left me with a dataframe of only two columns and multi-level index that i wasn't sure how to unstack.

                                color_set       shape_set
eff_date  prod_id  change_type 
20150414  20770    ADD          MONO COLOR SET  REC SHAPE SET
20150429  132      ADD          MONO COLOR SET  REC SHAPE SET
20150521  199      DROP         MONO COLOR SET  REC SHAPE SET
                   ADD          MONO COLOR SET  TET SHAPE SET

Any help on this is greatly appreciated!

user3483203

Your code looks fine apart from having to reset your index, but we can simplify it quite a bit (in particular remove the need for iterrows which can be painfully slow, using a pivot with a small trick to get your column names.

This answer assumes that you only have these two options in your column, if you have more categories, simply use numpy.select instead of numpy.where and define your conditions / outputs that way.


df['key'] = np.where(df['set_name'].str.contains('COLOR'), 'color_set', 'shape_set')

df.pivot_table(
  index=['eff_date', 'prod_id', 'change_type'],
  columns='key',
  values='set_name',
  aggfunc='first'
).reset_index()

key  eff_date  prod_id change_type       color_set      shape_set
0    20150414    20770         ADD  MONO COLOR SET  REC SHAPE SET
1    20150429      132         ADD  MONO COLOR SET  REC SHAPE SET
2    20150521      199         ADD  MONO COLOR SET  TET SHAPE SET
3    20150521      199        DROP  MONO COLOR SET  REC SHAPE SET

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

Expand rows in python pandas dataframe

分類Dev

Fetching all rows of dataframe using Python pandas

分類Dev

Python - Aggregating two rows with different operations for different columns

分類Dev

Aggregating neighboring rows with partitioning

分類Dev

How to rename the rows in dataframe using pandas read (Python)?

分類Dev

Python pandas: insert rows for missing dates, time series in groupby dataframe

分類Dev

validate rows before computing values using python/pandas DataFrame

分類Dev

Aggregating cells in dataframe into a sequence

分類Dev

Aggregating rows for multiple columns in R

分類Dev

Sum of specific rows in a dataframe (Pandas)

分類Dev

Pandas DataFrame - dictionary in rows to columns

分類Dev

Python joining rows of a pandas dataframe over same values and aggregrating string values

分類Dev

Using python/pandas to search dataframe rows containing both a user-specified integer and approximated float value

分類Dev

Filling DataFrame Pandas Python

分類Dev

Python : Pandas DataFrame to CSV

分類Dev

reverse dataframe's rows' order with pandas

分類Dev

How to merge two rows in a dataframe pandas

分類Dev

How to shift several rows in a pandas DataFrame?

分類Dev

Randomly assign values to subset of rows in pandas dataframe

分類Dev

Computing percentage difference between pandas dataframe rows

分類Dev

Grab rows with max date from pandas dataframe

分類Dev

Remove rows from pandas DataFrame based on condition

分類Dev

Combine Pandas DataFrame Rows by Timestamp and Column

分類Dev

Pandas Dataframe filter rows by only one column

分類Dev

Selecting and using duplicate rows in a Pandas dataFrame

分類Dev

How to remove unique rows in pandas dataframe

分類Dev

Pandas - Add new rows to dataframe with arithmetic

分類Dev

Merge Multiple Columns As New Rows in Pandas Dataframe

分類Dev

Concatenate values from earlier rows in a pandas dataframe

Related 関連記事

  1. 1

    Expand rows in python pandas dataframe

  2. 2

    Fetching all rows of dataframe using Python pandas

  3. 3

    Python - Aggregating two rows with different operations for different columns

  4. 4

    Aggregating neighboring rows with partitioning

  5. 5

    How to rename the rows in dataframe using pandas read (Python)?

  6. 6

    Python pandas: insert rows for missing dates, time series in groupby dataframe

  7. 7

    validate rows before computing values using python/pandas DataFrame

  8. 8

    Aggregating cells in dataframe into a sequence

  9. 9

    Aggregating rows for multiple columns in R

  10. 10

    Sum of specific rows in a dataframe (Pandas)

  11. 11

    Pandas DataFrame - dictionary in rows to columns

  12. 12

    Python joining rows of a pandas dataframe over same values and aggregrating string values

  13. 13

    Using python/pandas to search dataframe rows containing both a user-specified integer and approximated float value

  14. 14

    Filling DataFrame Pandas Python

  15. 15

    Python : Pandas DataFrame to CSV

  16. 16

    reverse dataframe's rows' order with pandas

  17. 17

    How to merge two rows in a dataframe pandas

  18. 18

    How to shift several rows in a pandas DataFrame?

  19. 19

    Randomly assign values to subset of rows in pandas dataframe

  20. 20

    Computing percentage difference between pandas dataframe rows

  21. 21

    Grab rows with max date from pandas dataframe

  22. 22

    Remove rows from pandas DataFrame based on condition

  23. 23

    Combine Pandas DataFrame Rows by Timestamp and Column

  24. 24

    Pandas Dataframe filter rows by only one column

  25. 25

    Selecting and using duplicate rows in a Pandas dataFrame

  26. 26

    How to remove unique rows in pandas dataframe

  27. 27

    Pandas - Add new rows to dataframe with arithmetic

  28. 28

    Merge Multiple Columns As New Rows in Pandas Dataframe

  29. 29

    Concatenate values from earlier rows in a pandas dataframe

ホットタグ

アーカイブ