I am stuck with pandas. My idea is to resample data that are expressed by factors. For example, I have observed two cats named Charles and Valentine. As animals are expressing the behavior for longer times, the observations are made when current behaviour changes. I want to resample to get minute-wise data
name;timestamp;activity
Charles;10.10.18 12:31;drinks
Charles;10.10.18 12:51;sleep
Charles;10.10.18 13:01;mouse
Valentine;10.10.18 12:31;drinks
Valentine;10.10.18 12:51;sleep
Valentine;10.10.18 13:01;purr
My desired output should look like this:
name timestamp activity
Charles 10.10.18 12:31 drinks
Charles 10.10.18 12:32 drinks
Charles 10.10.18 12:33 drinks
Charles 10.10.18 12:34 drinks
Charles 10.10.18 12:35 drinks
Charles 10.10.18 12:36 drinks
Charles 10.10.18 12:37 drinks
Charles 10.10.18 12:38 drinks
Charles 10.10.18 12:39 drinks
Charles 10.10.18 12:40 drinks
Charles 10.10.18 12:41 drinks
Charles 10.10.18 12:42 drinks
Charles 10.10.18 12:43 drinks
Charles 10.10.18 12:44 drinks
Charles 10.10.18 12:45 drinks
Charles 10.10.18 12:46 drinks
Charles 10.10.18 12:47 drinks
Charles 10.10.18 12:48 drinks
Charles 10.10.18 12:49 drinks
Charles 10.10.18 12:50 drinks
Charles 10.10.18 12:51 sleeps
Charles 10.10.18 12:52 sleeps
Charles 10.10.18 12:53 sleeps
Charles 10.10.18 12:54 sleeps
Charles 10.10.18 12:55 sleeps
Charles 10.10.18 12:56 sleeps
Charles 10.10.18 12:57 sleeps
Charles 10.10.18 12:58 sleeps
Charles 10.10.18 12:59 sleeps
Charles 10.10.18 13:00 sleeps
Charles 10.10.18 13:01 mouse
Valentine 10.10.18 12:31 drinks
Valentine 10.10.18 12:32 drinks
Valentine 10.10.18 12:33 drinks
Valentine 10.10.18 12:34 drinks
Valentine 10.10.18 12:35 drinks
Valentine 10.10.18 12:36 drinks
Valentine 10.10.18 12:37 drinks
Valentine 10.10.18 12:38 drinks
Valentine 10.10.18 12:39 drinks
Valentine 10.10.18 12:40 drinks
Valentine 10.10.18 12:41 drinks
Valentine 10.10.18 12:42 drinks
Valentine 10.10.18 12:43 drinks
Valentine 10.10.18 12:44 drinks
Valentine 10.10.18 12:45 drinks
Valentine 10.10.18 12:46 drinks
Valentine 10.10.18 12:47 drinks
Valentine 10.10.18 12:48 drinks
Valentine 10.10.18 12:49 drinks
Valentine 10.10.18 12:50 drinks
Valentine 10.10.18 12:51 sleeps
Valentine 10.10.18 12:52 sleeps
Valentine 10.10.18 12:53 sleeps
Valentine 10.10.18 12:54 sleeps
Valentine 10.10.18 12:55 sleeps
Valentine 10.10.18 12:56 sleeps
Valentine 10.10.18 12:57 sleeps
Valentine 10.10.18 12:58 sleeps
Valentine 10.10.18 12:59 sleeps
Valentine 10.10.18 13:00 sleeps
Valentine 10.10.18 13:01 purr
Using
data.resample('60S').pad()
didn't work as Pandas states that timestamps are not unique.
Subsetting data for one cat per time didn't help much.
You are definitely on the right track with pad
. The only things to notice are the following:
groupby
is your friend.reset_index
, set_index
, unstack
, and stack
can typically be used to massage the result into its desired form (but if you don't mind the output being slightly different from your desired output, chances are you can skip this part).As such, you could let
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp').groupby('name').resample('T').pad().activity.reset_index()
In practice:
In [54]: df
Out[54]:
name timestamp activity
0 Charles 2018-10-10 12:31:00 drinks
1 Charles 2018-10-10 12:51:00 sleep
2 Charles 2018-10-10 13:01:00 mouse
3 Valentine 2018-10-10 12:31:00 drinks
4 Valentine 2018-10-10 12:51:00 sleep
5 Valentine 2018-10-10 13:01:00 purr
In [91]: df.set_index('timestamp').groupby('name').resample('T').pad().activity.reset_index().head()
Out[91]:
name timestamp activity
0 Charles 2018-10-10 12:31:00 drinks
1 Charles 2018-10-10 12:32:00 drinks
2 Charles 2018-10-10 12:33:00 drinks
3 Charles 2018-10-10 12:34:00 drinks
4 Charles 2018-10-10 12:35:00 drinks
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments