How to resample data in Pandas with discrete data?

smurfit89

I am stuck with pandas. My idea is to resample data that are expressed by factors. For example, I have observed two cats named Charles and Valentine. As animals are expressing the behavior for longer times, the observations are made when current behaviour changes. I want to resample to get minute-wise data

name;timestamp;activity
Charles;10.10.18 12:31;drinks
Charles;10.10.18 12:51;sleep
Charles;10.10.18 13:01;mouse
Valentine;10.10.18 12:31;drinks
Valentine;10.10.18 12:51;sleep
Valentine;10.10.18 13:01;purr

My desired output should look like this:

name    timestamp   activity
Charles 10.10.18 12:31  drinks
Charles 10.10.18 12:32  drinks
Charles 10.10.18 12:33  drinks
Charles 10.10.18 12:34  drinks
Charles 10.10.18 12:35  drinks
Charles 10.10.18 12:36  drinks
Charles 10.10.18 12:37  drinks
Charles 10.10.18 12:38  drinks
Charles 10.10.18 12:39  drinks
Charles 10.10.18 12:40  drinks
Charles 10.10.18 12:41  drinks
Charles 10.10.18 12:42  drinks
Charles 10.10.18 12:43  drinks
Charles 10.10.18 12:44  drinks
Charles 10.10.18 12:45  drinks
Charles 10.10.18 12:46  drinks
Charles 10.10.18 12:47  drinks
Charles 10.10.18 12:48  drinks
Charles 10.10.18 12:49  drinks
Charles 10.10.18 12:50  drinks
Charles 10.10.18 12:51  sleeps
Charles 10.10.18 12:52  sleeps
Charles 10.10.18 12:53  sleeps
Charles 10.10.18 12:54  sleeps
Charles 10.10.18 12:55  sleeps
Charles 10.10.18 12:56  sleeps
Charles 10.10.18 12:57  sleeps
Charles 10.10.18 12:58  sleeps
Charles 10.10.18 12:59  sleeps
Charles 10.10.18 13:00  sleeps
Charles 10.10.18 13:01  mouse
Valentine   10.10.18 12:31  drinks
Valentine   10.10.18 12:32  drinks
Valentine   10.10.18 12:33  drinks
Valentine   10.10.18 12:34  drinks
Valentine   10.10.18 12:35  drinks
Valentine   10.10.18 12:36  drinks
Valentine   10.10.18 12:37  drinks
Valentine   10.10.18 12:38  drinks
Valentine   10.10.18 12:39  drinks
Valentine   10.10.18 12:40  drinks
Valentine   10.10.18 12:41  drinks
Valentine   10.10.18 12:42  drinks
Valentine   10.10.18 12:43  drinks
Valentine   10.10.18 12:44  drinks
Valentine   10.10.18 12:45  drinks
Valentine   10.10.18 12:46  drinks
Valentine   10.10.18 12:47  drinks
Valentine   10.10.18 12:48  drinks
Valentine   10.10.18 12:49  drinks
Valentine   10.10.18 12:50  drinks
Valentine   10.10.18 12:51  sleeps
Valentine   10.10.18 12:52  sleeps
Valentine   10.10.18 12:53  sleeps
Valentine   10.10.18 12:54  sleeps
Valentine   10.10.18 12:55  sleeps
Valentine   10.10.18 12:56  sleeps
Valentine   10.10.18 12:57  sleeps
Valentine   10.10.18 12:58  sleeps
Valentine   10.10.18 12:59  sleeps
Valentine   10.10.18 13:00  sleeps
Valentine   10.10.18 13:01  purr

Using

data.resample('60S').pad() 

didn't work as Pandas states that timestamps are not unique.

Subsetting data for one cat per time didn't help much.

fuglede

You are definitely on the right track with pad. The only things to notice are the following:

  • In order to resample a time series, you need your data frame index to consist of the times to be resampled.
  • Whenever you need to split up the data so that each name is treated differently, groupby is your friend.
  • When performing an action on a group, the resulting time series has as (part of) its index the column used for grouping, so some combination of reset_index, set_index, unstack, and stack can typically be used to massage the result into its desired form (but if you don't mind the output being slightly different from your desired output, chances are you can skip this part).

As such, you could let

df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp').groupby('name').resample('T').pad().activity.reset_index()

In practice:

In [54]: df

Out[54]:
        name           timestamp activity
0    Charles 2018-10-10 12:31:00   drinks
1    Charles 2018-10-10 12:51:00    sleep
2    Charles 2018-10-10 13:01:00    mouse
3  Valentine 2018-10-10 12:31:00   drinks
4  Valentine 2018-10-10 12:51:00    sleep
5  Valentine 2018-10-10 13:01:00     purr

In [91]: df.set_index('timestamp').groupby('name').resample('T').pad().activity.reset_index().head()
Out[91]:
      name           timestamp activity
0  Charles 2018-10-10 12:31:00   drinks
1  Charles 2018-10-10 12:32:00   drinks
2  Charles 2018-10-10 12:33:00   drinks
3  Charles 2018-10-10 12:34:00   drinks
4  Charles 2018-10-10 12:35:00   drinks

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How to groupby and resample data in pandas?

From Dev

Resample OHLC data with pandas

From Dev

How to use pandas to resample time series data

From Python

How to resample daily data to hourly data for all whole days with pandas?

From Dev

How to resample pandas df tick data to 5 min OHLC data

From Dev

resample irregularly spaced data in pandas

From Dev

pandas resample nested ohlc data

From Dev

How to do resample of intraday timeseries data with dateOffset in Pandas/Numpy?

From Dev

In Pandas, how to return 2 data using resample('D').first()?

From Dev

How to use pandas resample using 'day of year' data (Python)

From Dev

Pandas resample by first day in my data

From Dev

pandas groupby resample leads to missing data

From Dev

Resample Daily Data to Monthly with Pandas (date formatting)

From Dev

Pandas Resample Upsample last date / edge of data

From Dev

resample data within each group in pandas

From Dev

Pandas resample based on higher resolution data

From Dev

Pandas Resample OHLC data Skipping time

From Dev

resample time data from list data in pandas python

From Dev

R: how to resample intraday data at the group level?

From Dev

How to resample and interpolate (cubic spline) timeseries data

From Dev

How to view cross validated resample data in r?

From Dev

How to resample the data by month and plot monthly percentages?

From Dev

How to select data for especific time intervals after using Pandas’ resample function?

From Dev

python pandas dataframe resample.last how to make sure data comes from the same row

From Dev

how to resample accumulated data by seconds and reset at every day using pandas dataframe

From Dev

How to resample ohlc data properly in pandas / custom fill method per column

From Dev

Convert continuous numerical data to discrete numerical data in Pandas

From Dev

How to resample data in data frame without changing one specific column?

From Dev

How to resample 1 minute data into 15 minute data?

Related Related

  1. 1

    How to groupby and resample data in pandas?

  2. 2

    Resample OHLC data with pandas

  3. 3

    How to use pandas to resample time series data

  4. 4

    How to resample daily data to hourly data for all whole days with pandas?

  5. 5

    How to resample pandas df tick data to 5 min OHLC data

  6. 6

    resample irregularly spaced data in pandas

  7. 7

    pandas resample nested ohlc data

  8. 8

    How to do resample of intraday timeseries data with dateOffset in Pandas/Numpy?

  9. 9

    In Pandas, how to return 2 data using resample('D').first()?

  10. 10

    How to use pandas resample using 'day of year' data (Python)

  11. 11

    Pandas resample by first day in my data

  12. 12

    pandas groupby resample leads to missing data

  13. 13

    Resample Daily Data to Monthly with Pandas (date formatting)

  14. 14

    Pandas Resample Upsample last date / edge of data

  15. 15

    resample data within each group in pandas

  16. 16

    Pandas resample based on higher resolution data

  17. 17

    Pandas Resample OHLC data Skipping time

  18. 18

    resample time data from list data in pandas python

  19. 19

    R: how to resample intraday data at the group level?

  20. 20

    How to resample and interpolate (cubic spline) timeseries data

  21. 21

    How to view cross validated resample data in r?

  22. 22

    How to resample the data by month and plot monthly percentages?

  23. 23

    How to select data for especific time intervals after using Pandas’ resample function?

  24. 24

    python pandas dataframe resample.last how to make sure data comes from the same row

  25. 25

    how to resample accumulated data by seconds and reset at every day using pandas dataframe

  26. 26

    How to resample ohlc data properly in pandas / custom fill method per column

  27. 27

    Convert continuous numerical data to discrete numerical data in Pandas

  28. 28

    How to resample data in data frame without changing one specific column?

  29. 29

    How to resample 1 minute data into 15 minute data?

HotTag

Archive