I have a data frame with three columns of interest, 'time', 'peak' and 'cycle'. I want to calculate the time elapsed between each row for a given cycle.
time peak cycle
0 1 1 1
1 2 0 1
2 3.5 0 1
3 3.8 1 2
4 5 0 2
5 6.2 0 2
6 7 0 2
I want to add a fourth column, so the data frame would look like this when complete:
time peak cycle time_elapsed
0 1 1 1 0
1 2 0 1 1
2 3.5 0 1 1.5
3 3.8 1 2 0
4 5 0 2 1.2
5 6.2 0 2 1.2
6 7 0 2 0.8
The cycle number is calculated based on the peak information, so I don't think I need to refer to both columns.
data['time_elapsed'] = data['time'] - data['time'].shift()
Applying the above code I get:
time peak cycle time_elapsed
0 1 1 1 0
1 2 0 1 1
2 3.5 0 1 1.5
3 3.8 1 2 0.3
4 5 0 2 1.2
5 6.2 0 2 1.2
6 7 0 2 0.8
Is there a way to "reset" the calculation every time the value in 'peak' is 1?Any tips or advice would be appreciated!
Subtract first value per groups converted in Series
by GroupBy.transform
with GroupBy.first
:
df['time_elapsed'] = df['time'].sub(df.groupby('cycle')['time'].transform('first'))
print (df)
time peak cycle time_elapsed
0 1 1 1 0
1 2 0 1 1
2 3 0 1 2
3 4 1 2 0
4 5 0 2 1
5 6 0 2 2
6 7 0 2 3
For adding reset add new Series
with Series.cumsum
- if values are only 1
or 0
in peak
column:
s = df['peak'].cumsum()
df['time_elapsed'] = df['time'].sub(df.groupby(['cycle', s])['time'].transform('first'))
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加