Pivoting a pandas dataframe with duplicate index values

metersk

I have a data frame which has rows for each user joining my site and making a purchase.

+---+-----+--------------------+---------+--------+-----+
|   | uid |        msg         |  _time  | gender | age |
+---+-----+--------------------+---------+--------+-----+
| 0 |   1 | confirmed_settings | 1/29/15 | M      |  37 |
| 1 |   1 | sale               | 4/13/15 | M      |  37 |
| 2 |   3 | confirmed_settings | 4/19/15 | M      |  35 |
| 3 |   4 | confirmed_settings | 2/21/15 | M      |  21 |
| 4 |   5 | confirmed_settings | 3/28/15 | M      |  18 |
| 5 |   4 | sale               | 3/15/15 | M      |  21 |
+---+-----+--------------------+---------+--------+-----+

I would like to change the dataframe so that each row is unique for a uid and there is a columns called sale and confirmed_settings which have the timestamp of the action. Note that not every user has a sale, but every user has a confirmed_settings. Like below:

+---+-----+--------------------+---------+---------+--------+-----+
|   | uid | confirmed_settings |  sale   |  _time  | gender | age |
+---+-----+--------------------+---------+---------+--------+-----+
| 0 |   1 | 1/29/15            | 4/13/15 | 1/29/15 | M      |  37 |
| 1 |   3 | 4/19/15            | null    | 4/19/15 | M      |  35 |
| 2 |   4 | 2/21/15            | 3/15/15 | 2/21/15 | M      |  21 |
| 3 |   5 | 3/28/15            | null    | 3/28/15 | M      |  18 |
+---+-----+--------------------+---------+---------+--------+-----+

To do this, I am trying:

df1 = df.pivot(index='uid', columns='msg', values='_time').reset_index()
df1 = df1.merge(df[['uid', 'gender', 'age']].drop_duplicates(), on='uid')

But I get this error: ValueError: Index contains duplicate entries, cannot reshape

How can I pivot a df with duplicate index values to transform my dataframe?


Edit: df1 = df.pivot_table(index='uid', columns='msg', values='_time').reset_index()

gives this error DataError: No numeric types to aggregate but im not even sure that is the right path to go on.

fixxxer

x is the data frame that you have as input :

    uid               msg   _time   gender  age
0   1   confirmed_settings  1/29/15 M       37
1   1   sale                4/13/15 M       37
2   3   confirmed_settings  4/19/15 M       35
3   4   confirmed_settings  2/21/15 M       21
4   5   confirmed_settings  3/28/15 M       18
5   4   sale                3/15/15 M       21

y = x.pivot(index='uid', columns='msg', values='_time')
x.join(y).drop('msg', axis=1)

gives you:

    uid _time   gender  age     confirmed_settings  sale
0   1   1/29/15     M   37                    NaN   NaN
1   1   4/13/15     M   37                1/29/15   4/13/15
2   3   4/19/15     M   35                    NaN   NaN
3   4   2/21/15     M   21                4/19/15   NaN
4   5   3/28/15     M   18                2/21/15   3/15/15
5   4   3/15/15     M   21                3/28/15   NaN

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Pivoting pandas DataFrame -- AssertionError: Index length did not match values

From Dev

Pivoting a Pandas dataframe with a gapless daterange as index

From Dev

Pivoting a table with duplicate index

From Dev

Pandas dataframe pivoting

From Dev

Pivoting Dataframe with Pandas

From Dev

Reindex a dataframe with duplicate index values

From Dev

Pivot a dataframe with duplicate values in Index

From Dev

Counting duplicate values in Pandas DataFrame

From Dev

Multi-index pivoting in Pandas

From Java

Pivoting pandas dataframe by rank on id

From Dev

Pandas Dataframe Stacking versus Pivoting

From Dev

Resampling, grouping, pivoting a pandas dataframe

From Dev

SSIS - pivoting duplicate values in multiple records

From Dev

Collapsing entries with duplicate index values in DataFrame

From Dev

Collapsing entries with duplicate index values in DataFrame

From Java

How to remove duplicate values from multi-index pandas dataframe when saving to csv

From Dev

Pandas Divide dataframe by index values

From Dev

Finding count of duplicate values and ordering in a Pandas dataframe

From Dev

Check for duplicate values in Pandas dataframe column

From Dev

Use index values as category values in pandas dataframe

From Dev

Pandas: Pivoting with multi-index data

From Dev

pivoting pandas dataframe into prefixed cols, not a MultiIndex

From Dev

Pivoting a pandas dataframe to generate a (seaborn) heatmap

From Dev

How to efficiently columnize (=pivoting) pandas DataFrame (with groupby)?

From Dev

Slice Pandas dataframe by index values that are (not) in a list

From Dev

Pandas Dataframe - Count values based on index position

From Dev

Retrieving .loc index values in pandas dataframe

From Dev

Subset Pandas DataFrame Secondary Index and Reassigning values

From Dev

Getting the index of pandas dataframe for matching row values

Related Related

  1. 1

    Pivoting pandas DataFrame -- AssertionError: Index length did not match values

  2. 2

    Pivoting a Pandas dataframe with a gapless daterange as index

  3. 3

    Pivoting a table with duplicate index

  4. 4

    Pandas dataframe pivoting

  5. 5

    Pivoting Dataframe with Pandas

  6. 6

    Reindex a dataframe with duplicate index values

  7. 7

    Pivot a dataframe with duplicate values in Index

  8. 8

    Counting duplicate values in Pandas DataFrame

  9. 9

    Multi-index pivoting in Pandas

  10. 10

    Pivoting pandas dataframe by rank on id

  11. 11

    Pandas Dataframe Stacking versus Pivoting

  12. 12

    Resampling, grouping, pivoting a pandas dataframe

  13. 13

    SSIS - pivoting duplicate values in multiple records

  14. 14

    Collapsing entries with duplicate index values in DataFrame

  15. 15

    Collapsing entries with duplicate index values in DataFrame

  16. 16

    How to remove duplicate values from multi-index pandas dataframe when saving to csv

  17. 17

    Pandas Divide dataframe by index values

  18. 18

    Finding count of duplicate values and ordering in a Pandas dataframe

  19. 19

    Check for duplicate values in Pandas dataframe column

  20. 20

    Use index values as category values in pandas dataframe

  21. 21

    Pandas: Pivoting with multi-index data

  22. 22

    pivoting pandas dataframe into prefixed cols, not a MultiIndex

  23. 23

    Pivoting a pandas dataframe to generate a (seaborn) heatmap

  24. 24

    How to efficiently columnize (=pivoting) pandas DataFrame (with groupby)?

  25. 25

    Slice Pandas dataframe by index values that are (not) in a list

  26. 26

    Pandas Dataframe - Count values based on index position

  27. 27

    Retrieving .loc index values in pandas dataframe

  28. 28

    Subset Pandas DataFrame Secondary Index and Reassigning values

  29. 29

    Getting the index of pandas dataframe for matching row values

HotTag

Archive