Fastest way to filter a pandas dataframe on multiple columns

RoachLord

I have a pandas dataframe with several columns that labels data in a final column, for example,

df = pd.DataFrame( {'1_label' : ['a1','b1','c1','d1'],
                    '2_label' : ['a2','b2','c2','d2'],
                    '3_label' : ['a3','b3','c3','d3'],
                    'data'    : [1,2,3,4]})

df =      1_label 2_label 3_label  data
     0      a1      a2      a3     1
     1      b1      b2      b3     2
     2      c1      c2      c3     3
     3      d1      d2      d3     4

and a list of tuples,

list_t = [('a1','a2','a3'), ('d1','d2','d3')]

I want to filter this dataframe and return a new dataframe containing only the rows that correspond to the tuples in my list.

result =        1_label 2_label 3_label  data
            0      a1      a2      a3     1
            1      d1      d2      d3     4

My naive (and C++ inspired) solution was to use append (like vector::push_back)

for l1, l2, l3 in list_t:
    if df[(df['1_label'] == l1) & 
          (df['2_label'] == l2) & 
          (df['3_label'] == l3)].empty is False:
        result = result.append(df[(df['1_label'] == l1) & 
                              (df['2_label'] == l2) &
                              (df['3_label'] == l3)]

While my solution works I suspect it is horrendously slow for large dataframes and large list of tuples as I think pandas creates a new dataframe upon each call to append. Could anyone suggest a faster/cleaner way to do this? Thanks!

Ilja Everilä

Assuming no duplicates, you could create index out of the columns you want to "filter" on:

In [10]: df
Out[10]: 
  1_label 2_label 3_label  data
0      a1      a2      a3     1
1      b1      b2      b3     2
2      c1      c2      c3     3
3      d1      d2      d3     4

In [11]: df.set_index(['1_label', '2_label', '3_label'])\
    .loc[[('a1','a2','a3'), ('d1','d2','d3')]]\
    .reset_index()
Out[11]: 
  1_label 2_label 3_label  data
0      a1      a2      a3     1
1      d1      d2      d3     4

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Fastest way to filter a pandas dataframe on multiple columns

From Dev

Fastest Way To Filter A Pandas Dataframe Using A List

From Dev

Fastest way to copy columns from one DataFrame to another using pandas?

From Dev

Fastest way to copy columns from one DataFrame to another using pandas?

From Dev

Filter pandas dataframe based on values in multiple columns

From Java

Fastest way to "unpack' a pandas dataframe

From Java

Fastest way to update pandas columns based on matching column from other pandas dataframe

From Dev

Fastest ways to filter for values in pandas dataframe

From Dev

Fastest ways to filter for values in pandas dataframe

From Dev

How to filter pandas dataframe on multiple columns based on a dictionary?

From Dev

Fastest way to sort each row in a pandas dataframe

From Dev

Fastest Way to Drop Duplicated Index in a Pandas DataFrame

From Dev

Fastest way to iterate through a pandas dataframe?

From Dev

Pandas: What is the fastest way to search a large dataframe

From Dev

Efficient way to unnest (explode) multiple list columns in a pandas DataFrame

From Dev

pandas DataFrame filter by rows and columns

From Dev

Pandas filter columns of a DataFrame with bool

From Dev

Filter dataframe by two columns in Pandas

From Dev

How to filter a dataframe by multiple columns?

From Dev

Cleanest way to filter a Pandas dataframe?

From Dev

transpose multiple columns Pandas dataframe

From Java

Selecting multiple columns in a pandas dataframe

From Dev

Transposing a pandas dataframe with multiple columns

From Dev

Pandas Dataframe Groupby multiple columns

From Dev

Fastest way to eliminate specific dates from pandas dataframe

From Dev

How can we append unbalanced row on Pandas dataframe in the fastest way?

From Dev

The fastest way to update (partial sum of elements with complex conditions) the pandas dataframe

From Dev

Fastest way to compare row and previous row in pandas dataframe with millions of rows

From Dev

Fastest Way to Populate a Pandas DataFrame When Order Matters

Related Related

  1. 1

    Fastest way to filter a pandas dataframe on multiple columns

  2. 2

    Fastest Way To Filter A Pandas Dataframe Using A List

  3. 3

    Fastest way to copy columns from one DataFrame to another using pandas?

  4. 4

    Fastest way to copy columns from one DataFrame to another using pandas?

  5. 5

    Filter pandas dataframe based on values in multiple columns

  6. 6

    Fastest way to "unpack' a pandas dataframe

  7. 7

    Fastest way to update pandas columns based on matching column from other pandas dataframe

  8. 8

    Fastest ways to filter for values in pandas dataframe

  9. 9

    Fastest ways to filter for values in pandas dataframe

  10. 10

    How to filter pandas dataframe on multiple columns based on a dictionary?

  11. 11

    Fastest way to sort each row in a pandas dataframe

  12. 12

    Fastest Way to Drop Duplicated Index in a Pandas DataFrame

  13. 13

    Fastest way to iterate through a pandas dataframe?

  14. 14

    Pandas: What is the fastest way to search a large dataframe

  15. 15

    Efficient way to unnest (explode) multiple list columns in a pandas DataFrame

  16. 16

    pandas DataFrame filter by rows and columns

  17. 17

    Pandas filter columns of a DataFrame with bool

  18. 18

    Filter dataframe by two columns in Pandas

  19. 19

    How to filter a dataframe by multiple columns?

  20. 20

    Cleanest way to filter a Pandas dataframe?

  21. 21

    transpose multiple columns Pandas dataframe

  22. 22

    Selecting multiple columns in a pandas dataframe

  23. 23

    Transposing a pandas dataframe with multiple columns

  24. 24

    Pandas Dataframe Groupby multiple columns

  25. 25

    Fastest way to eliminate specific dates from pandas dataframe

  26. 26

    How can we append unbalanced row on Pandas dataframe in the fastest way?

  27. 27

    The fastest way to update (partial sum of elements with complex conditions) the pandas dataframe

  28. 28

    Fastest way to compare row and previous row in pandas dataframe with millions of rows

  29. 29

    Fastest Way to Populate a Pandas DataFrame When Order Matters

HotTag

Archive