Consider this dataframe:
import pandas as pd
df = pd.DataFrame({
'id': [458,459,464,469,507,512,516,519,519,615]
})
i want to find the difference of 2nd row - 1st row so i implemented:
df['diff'] = df['id'] - df['id'].shift(-1)
df.fillna(1)
id diff
0 458 -1.0
1 459 -5.0
2 464 -5.0
3 469 -38.0
4 507 -5.0
5 512 -4.0
6 516 -3.0
7 519 0.0
8 519 -96.0
9 615 1.0
Now i want to group these diff
column in such a way that, whenever the difference between the two rows is greater than 10
, make a new column group
and set all the above rows to 1, and so on.
As you can see in column diff
diffrence between 4th row and 3rd
id diff group
0 458 -1.0 1
1 459 -5.0 1
2 464 -5.0 1
3 469 -38.0 1
4 507 -5.0 2
5 512 -4.0 2
6 516 -3.0 2
7 519 0.0 2
8 519 -96.0 2
9 615 1.0 3
Any ideas how to achieve this?
You can use diff
, compare and then cumsum
boolean mask, last add 1
:
print (df['diff'].diff())
0 NaN
1 -4.0
2 0.0
3 -33.0
4 33.0
5 1.0
6 1.0
7 3.0
8 -96.0
9 97.0
Name: diff, dtype: float64
df['group'] = (df['diff'].diff() > 10).cumsum() + 1
print (df)
id diff group
0 458 -1.0 1
1 459 -5.0 1
2 464 -5.0 1
3 469 -38.0 1
4 507 -5.0 2
5 512 -4.0 2
6 516 -3.0 2
7 519 0.0 2
8 519 -96.0 2
9 615 1.0 3
df = df.assign(group=df['diff'].diff().gt(10).cumsum().add(1))
print (df)
id diff group
0 458 -1.0 1
1 459 -5.0 1
2 464 -5.0 1
3 469 -38.0 1
4 507 -5.0 2
5 512 -4.0 2
6 516 -3.0 2
7 519 0.0 2
8 519 -96.0 2
9 615 1.0 3
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments