Pivoting a pandas dataframe with duplicate index values

metersk

I have a data frame which has rows for each user joining my site and making a purchase.

+---+-----+--------------------+---------+--------+-----+
|   | uid |        msg         |  _time  | gender | age |
+---+-----+--------------------+---------+--------+-----+
| 0 |   1 | confirmed_settings | 1/29/15 | M      |  37 |
| 1 |   1 | sale               | 4/13/15 | M      |  37 |
| 2 |   3 | confirmed_settings | 4/19/15 | M      |  35 |
| 3 |   4 | confirmed_settings | 2/21/15 | M      |  21 |
| 4 |   5 | confirmed_settings | 3/28/15 | M      |  18 |
| 5 |   4 | sale               | 3/15/15 | M      |  21 |
+---+-----+--------------------+---------+--------+-----+

I would like to change the dataframe so that each row is unique for a uid and there is a columns called sale and confirmed_settings which have the timestamp of the action. Note that not every user has a sale, but every user has a confirmed_settings. Like below:

+---+-----+--------------------+---------+---------+--------+-----+
|   | uid | confirmed_settings |  sale   |  _time  | gender | age |
+---+-----+--------------------+---------+---------+--------+-----+
| 0 |   1 | 1/29/15            | 4/13/15 | 1/29/15 | M      |  37 |
| 1 |   3 | 4/19/15            | null    | 4/19/15 | M      |  35 |
| 2 |   4 | 2/21/15            | 3/15/15 | 2/21/15 | M      |  21 |
| 3 |   5 | 3/28/15            | null    | 3/28/15 | M      |  18 |
+---+-----+--------------------+---------+---------+--------+-----+

To do this, I am trying:

df1 = df.pivot(index='uid', columns='msg', values='_time').reset_index()
df1 = df1.merge(df[['uid', 'gender', 'age']].drop_duplicates(), on='uid')

But I get this error: ValueError: Index contains duplicate entries, cannot reshape

How can I pivot a df with duplicate index values to transform my dataframe?


Edit: df1 = df.pivot_table(index='uid', columns='msg', values='_time').reset_index()

gives this error DataError: No numeric types to aggregate but im not even sure that is the right path to go on.

fixxxer

x is the data frame that you have as input :

    uid               msg   _time   gender  age
0   1   confirmed_settings  1/29/15 M       37
1   1   sale                4/13/15 M       37
2   3   confirmed_settings  4/19/15 M       35
3   4   confirmed_settings  2/21/15 M       21
4   5   confirmed_settings  3/28/15 M       18
5   4   sale                3/15/15 M       21

y = x.pivot(index='uid', columns='msg', values='_time')
x.join(y).drop('msg', axis=1)

gives you:

    uid _time   gender  age     confirmed_settings  sale
0   1   1/29/15     M   37                    NaN   NaN
1   1   4/13/15     M   37                1/29/15   4/13/15
2   3   4/19/15     M   35                    NaN   NaN
3   4   2/21/15     M   21                4/19/15   NaN
4   5   3/28/15     M   18                2/21/15   3/15/15
5   4   3/15/15     M   21                3/28/15   NaN

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

Pivoting pandas dataframe by rank on id

From Java

How to remove duplicate values from multi-index pandas dataframe when saving to csv

From Dev

Multi-index pivoting in Pandas

From Dev

Pivoting pandas DataFrame -- AssertionError: Index length did not match values

From Dev

Pandas Dataframe - Count values based on index position

From Dev

Pandas: Pivoting with multi-index data

From Dev

Pandas Dataframe Stacking versus Pivoting

From Dev

Retrieving .loc index values in pandas dataframe

From Dev

Pivoting a Pandas dataframe with a gapless daterange as index

From Dev

pivoting pandas dataframe into prefixed cols, not a MultiIndex

From Dev

Resampling, grouping, pivoting a pandas dataframe

From Dev

Slice Pandas dataframe by index values that are (not) in a list

From Dev

Reindex a dataframe with duplicate index values

From Dev

Pivoting a pandas dataframe to generate a (seaborn) heatmap

From Dev

How to efficiently columnize (=pivoting) pandas DataFrame (with groupby)?

From Dev

Collapsing entries with duplicate index values in DataFrame

From Dev

Subset Pandas DataFrame Secondary Index and Reassigning values

From Dev

Counting duplicate values in Pandas DataFrame

From Dev

SSIS - pivoting duplicate values in multiple records

From Dev

Finding count of duplicate values and ordering in a Pandas dataframe

From Dev

Getting the index of pandas dataframe for matching row values

From Dev

Pandas dataframe pivoting

From Dev

Pandas Divide dataframe by index values

From Dev

Check for duplicate values in Pandas dataframe column

From Dev

Pivoting a table with duplicate index

From Dev

Pivot a dataframe with duplicate values in Index

From Dev

Collapsing entries with duplicate index values in DataFrame

From Dev

Pivoting Dataframe with Pandas

From Dev

Use index values as category values in pandas dataframe

Related Related

  1. 1

    Pivoting pandas dataframe by rank on id

  2. 2

    How to remove duplicate values from multi-index pandas dataframe when saving to csv

  3. 3

    Multi-index pivoting in Pandas

  4. 4

    Pivoting pandas DataFrame -- AssertionError: Index length did not match values

  5. 5

    Pandas Dataframe - Count values based on index position

  6. 6

    Pandas: Pivoting with multi-index data

  7. 7

    Pandas Dataframe Stacking versus Pivoting

  8. 8

    Retrieving .loc index values in pandas dataframe

  9. 9

    Pivoting a Pandas dataframe with a gapless daterange as index

  10. 10

    pivoting pandas dataframe into prefixed cols, not a MultiIndex

  11. 11

    Resampling, grouping, pivoting a pandas dataframe

  12. 12

    Slice Pandas dataframe by index values that are (not) in a list

  13. 13

    Reindex a dataframe with duplicate index values

  14. 14

    Pivoting a pandas dataframe to generate a (seaborn) heatmap

  15. 15

    How to efficiently columnize (=pivoting) pandas DataFrame (with groupby)?

  16. 16

    Collapsing entries with duplicate index values in DataFrame

  17. 17

    Subset Pandas DataFrame Secondary Index and Reassigning values

  18. 18

    Counting duplicate values in Pandas DataFrame

  19. 19

    SSIS - pivoting duplicate values in multiple records

  20. 20

    Finding count of duplicate values and ordering in a Pandas dataframe

  21. 21

    Getting the index of pandas dataframe for matching row values

  22. 22

    Pandas dataframe pivoting

  23. 23

    Pandas Divide dataframe by index values

  24. 24

    Check for duplicate values in Pandas dataframe column

  25. 25

    Pivoting a table with duplicate index

  26. 26

    Pivot a dataframe with duplicate values in Index

  27. 27

    Collapsing entries with duplicate index values in DataFrame

  28. 28

    Pivoting Dataframe with Pandas

  29. 29

    Use index values as category values in pandas dataframe

HotTag

Archive