Pandas: How to extract rows of a dataframe matching Filter1 OR filter2

debugcn Published at Dev

Jérémz

I have a pandas dataframe that look like this for exemple:

label          Y88_N          diff       div      fold  
0       25273.626713  17348.581851  2.016404  2.016404  
1       29139.510491  -4208.868050  0.604304 -0.604304  
2       34388.439717 -30147.834699  0.458903 -0.458903  
3       69704.254089 -32976.152490  0.116894 -0.116894  
4      193717.440783 -71359.494098  0.286045 -0.286045  
5       28996.634708  10934.944533  2.031293  2.031293  
6       45021.782930    680.437629  1.056383  1.056383

but with thousands of rows. I would like to get a new dataframe with rows when values in 'fold' column are > 2 OR < 0.6. So at the end the dataframe should look like this:

label          Y88_N          diff       div      fold  
0       25273.626713  17348.581851  2.016404  2.016404  
1       29139.510491  -4208.868050  0.604304 -0.604304  
5       28996.634708  10934.944533  2.031293  2.031293

I have tried different things like:

def ranged(start, end, step):
x = start
    while x < end:
        yield x
        x += step
df2 = df[~df['fold'].isin(ranged(-0.6, 2, 0.000001))]

df2 = df[(df['fold'] >= 2) & (df['fold'] <= -0.6)]

But nothing seems to work Is there a easy way to select values in a column that are either matching a filter 1 OR a filter 2? Thanks

Zero

You could do

In [276]: df[(df['fold'] >= 2) | (df['fold'] <= -0.6)]
Out[276]:
   label         Y88_N          diff       div      fold
0      0  25273.626713  17348.581851  2.016404  2.016404
1      1  29139.510491  -4208.868050  0.604304 -0.604304
5      5  28996.634708  10934.944533  2.031293  2.031293

Or use query method like

In [277]: df.query('fold >=2 | fold <=-0.6')
Out[277]:
   label         Y88_N          diff       div      fold
0      0  25273.626713  17348.581851  2.016404  2.016404
1      1  29139.510491  -4208.868050  0.604304 -0.604304
5      5  28996.634708  10934.944533  2.031293  2.031293

And, pd.eval() works well with expressions containing large arrays

In [278]: df[pd.eval('df.fold >=2 | df.fold <=-0.6')]
Out[278]:
   label         Y88_N          diff       div      fold
0      0  25273.626713  17348.581851  2.016404  2.016404
1      1  29139.510491  -4208.868050  0.604304 -0.604304
5      5  28996.634708  10934.944533  2.031293  2.031293

Collected from the Internet

Please contact [email protected] to delete if infringement.