I have a pandas dataframe that look like this for exemple:
label Y88_N diff div fold
0 25273.626713 17348.581851 2.016404 2.016404
1 29139.510491 -4208.868050 0.604304 -0.604304
2 34388.439717 -30147.834699 0.458903 -0.458903
3 69704.254089 -32976.152490 0.116894 -0.116894
4 193717.440783 -71359.494098 0.286045 -0.286045
5 28996.634708 10934.944533 2.031293 2.031293
6 45021.782930 680.437629 1.056383 1.056383
but with thousands of rows. I would like to get a new dataframe with rows when values in 'fold' column are > 2 OR < 0.6. So at the end the dataframe should look like this:
label Y88_N diff div fold
0 25273.626713 17348.581851 2.016404 2.016404
1 29139.510491 -4208.868050 0.604304 -0.604304
5 28996.634708 10934.944533 2.031293 2.031293
I have tried different things like:
def ranged(start, end, step):
x = start
while x < end:
yield x
x += step
df2 = df[~df['fold'].isin(ranged(-0.6, 2, 0.000001))]
or
df2 = df[(df['fold'] >= 2) & (df['fold'] <= -0.6)]
But nothing seems to work Is there a easy way to select values in a column that are either matching a filter 1 OR a filter 2? Thanks
You could do
In [276]: df[(df['fold'] >= 2) | (df['fold'] <= -0.6)]
Out[276]:
label Y88_N diff div fold
0 0 25273.626713 17348.581851 2.016404 2.016404
1 1 29139.510491 -4208.868050 0.604304 -0.604304
5 5 28996.634708 10934.944533 2.031293 2.031293
Or use query
method like
In [277]: df.query('fold >=2 | fold <=-0.6')
Out[277]:
label Y88_N diff div fold
0 0 25273.626713 17348.581851 2.016404 2.016404
1 1 29139.510491 -4208.868050 0.604304 -0.604304
5 5 28996.634708 10934.944533 2.031293 2.031293
And, pd.eval()
works well with expressions containing large arrays
In [278]: df[pd.eval('df.fold >=2 | df.fold <=-0.6')]
Out[278]:
label Y88_N diff div fold
0 0 25273.626713 17348.581851 2.016404 2.016404
1 1 29139.510491 -4208.868050 0.604304 -0.604304
5 5 28996.634708 10934.944533 2.031293 2.031293
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments