How do you filter rows in a pandas dataframe conditional on columns existing?

debugcn Published at Dev

user529499

I have a dataframe containing counts of two things, which I've put in columns numA and numB. I want to find the rows where numA < x and numB < y, which can be done like so:

filtered_df = df[(df.numA < x) & (df.numB < y)]

This works when both numA and numB are present. However neither column is guaranteed to appear in the dataframe. If only one column exists, I would still like to filter the rows based on it. This could be easily coded with something along the lines of

if "numA" in df.columns:
    filtered_df = df[df.numA < x]
if "numB" in df.columns:
    filtered_df = filtered_df[filtered_df.numB < y]

But this seems very inefficient, especially since in reality I have 9 columns like this, and each of these requires the same check. Is there a way to achieve the same thing but with code that is more readable, easier to maintain and less tedious to write out?

noah

If you want an all-or-nothing type comparison I think a fairly easy way is to use set comparisons:

if(set(list_of_cols_to_check).issubset(df.columns)):
    filtered_df = df[(df.numA < x) & ... & (df.numB < y)]

If you want to perform comparisons for all that do exist it gets a bit more complicated. It is not very different than what you have, but I'd probably do it as follows:

filter = (df.index >= 0) #always true
filter = filter & (df.numA < 4)  if 'numA' in df else filter
filter = filter & (df.numB < 2)  if 'numB' in df else filter
filter = filter & (df.numC < 1)  if 'numC' in df else filter
df[filter]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-04-7

Comments

0 comments

From Dev

Related Related

Article

How do you filter rows in a pandas dataframe conditional on columns existing?

How do you filter rows in a pandas dataframe conditional on columns existing?

How to filter Pandas rows by another Dataframe columns?

pandas DataFrame filter by rows and columns

How do you filter out rows with NaN in a panda's dataframe

How do you filter out rows with NaN in a panda's dataframe

Constructing pandas dataframe with rows conditional on their not existing in another dataframe python

how do you filter pandas dataframes by multiple columns

how do you filter a Pandas dataframe by a multi-column set?

How to add rows into existing dataframe in pandas? - python

Conditional aggregation on pandas dataframe columns with combining 'n' rows into 1 row

Python/Pandas: filter and organize the rows and columns of a dataframe based on another dataframe

How do you use pandas.DataFrame columns as index, columns, and values?

How to filter pandas dataframe columns by partial label

how to switch columns rows in a pandas dataframe

How to transpose dataframe columns into rows in pandas

How do you pop multiple columns off a Pandas dataframe, into a new dataframe?

How do you transpose a dask dataframe (convert columns to rows) to approach tidy data principles

How do I calculate mean on filtered rows of a pandas dataframe and append means to all columns of original dataframe?

Conditional operations for rows in pandas dataframe

Filter dataframe rows depending on columns

How do you stack two Pandas Dataframe columns on top of each other?

How do you combine Range and Individual / Array Columns Selection in Pandas DataFrame

Pandas DataFrame, How do I remove all columns and rows that sum to 0

How do I drop rows from a Pandas dataframe based on data in multiple columns?

Pandas DataFrame, How do I remove all columns and rows that sum to 0

How to operate conditional calculation between columns in pandas dataframe?

How do you shift Pandas DataFrame with a multiindex?

How do you represent na in a Pandas DataFrame?

How do you represent na in a Pandas DataFrame?

How to filter rows with zero in some columns from the dataframe?