I need to null values in several columns where they are less in absolute value than correspond values in the threshold column
import pandas as pd
import numpy as np
df=pd.DataFrame({'key1': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'key2': [2000, 2001, 2002, 2001, 2002],
'data1': np.random.randn(5),
'data2': np.random.randn(5),
'threshold': [0.5,0.4,0.6,0.1,0.2]}).set_index(['key1','key2'])
data1 data2 threshold
key1 key2
Ohio 2000 0.201240 0.083833 0.5
2001 -1.993489 -1.081208 0.4
2002 0.759038 -1.688769 0.6
Nevada 2001 -0.543916 1.412679 0.1
2002 -1.545781 0.181224 0.2
this gives me an error "cannot join with no level specified and no overlapping names" df.where(df.abs()>df['threshold'])
this works but obviously against a scalar df.where(df.abs()>0.5)
data1 data2 threshold
key1 key2
Ohio 2000 NaN NaN NaN
2001 -1.993489 -1.081208 NaN
2002 0.759038 -1.688769 NaN
Nevada 2001 -0.543916 1.412679 NaN
2002 -1.545781 NaN NaN
BTW, this does appear to be giving me an OK result - still want to find out how to do it with where() method
df.apply(lambda x:x.where(x.abs()>x['threshold']),axis=1)
Here's a slightly different option using the DataFrame.gt
(greater than) method.
df[df.abs().gt(df['threshold'], axis='rows')]
Out[16]:
# Output might not look the same because of different random numbers,
# use np.random.seed() for reproducible random number gen
Out[13]:
data1 data2 threshold
key1 key2
Ohio 2000 NaN NaN NaN
2001 1.954543 1.372174 NaN
2002 NaN NaN NaN
Nevada 2001 0.275814 0.854617 NaN
2002 NaN 0.204993 NaN
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments