I have a dataframe where I want to determine when the ser_no
and CTRY_NM
are the same and differ. However, I want to be mindful of the ser_no
changes and not make a false and false return true or a false/true return false.
Consider the following dataframe:
import pandas as pd
df = pd.DataFrame({'ser_no': [1, 1, 1, 2, 2, 2, 2, 3, 3, 3],
'CTRY_NM': ['a', 'a', 'b', 'e', 'e', 'a', 'b', 'b', 'b', 'd']})
def check(key):
return df[key] == df[key].shift(1)
match = check('ser_no') == check('CTRY_NM')
This returns:
However, at indices, 4 and 8 we have serial number changes. Since each serial number is a different machine, it doesn't make sense to have a logical comparison at these locations. When ser_no
changes, how can I insert NaN
instead of do a logical comparison?
is this what you want?
def check(data, key):
mask = data[key].shift(1) == data[key]
mask.iloc[0] = np.nan
return mask
df.groupby(by=['ser_no']).apply(lambda x: check(x, 'CTRY_NM'))
result
ser_no
1 0 NaN
1 1
2 0
2 3 NaN
4 1
5 0
6 0
3 7 NaN
8 1
9 0
Name: CTRY_NM, dtype: float64
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments