I have a dataframe containing counts of two things, which I've put in columns numA
and numB
. I want to find the rows where numA < x
and numB < y
, which can be done like so:
filtered_df = df[(df.numA < x) & (df.numB < y)]
This works when both numA
and numB
are present. However neither column is guaranteed to appear in the dataframe. If only one column exists, I would still like to filter the rows based on it. This could be easily coded with something along the lines of
if "numA" in df.columns:
filtered_df = df[df.numA < x]
if "numB" in df.columns:
filtered_df = filtered_df[filtered_df.numB < y]
But this seems very inefficient, especially since in reality I have 9 columns like this, and each of these requires the same check. Is there a way to achieve the same thing but with code that is more readable, easier to maintain and less tedious to write out?
If you want an all-or-nothing type comparison I think a fairly easy way is to use set comparisons:
if(set(list_of_cols_to_check).issubset(df.columns)):
filtered_df = df[(df.numA < x) & ... & (df.numB < y)]
If you want to perform comparisons for all that do exist it gets a bit more complicated. It is not very different than what you have, but I'd probably do it as follows:
filter = (df.index >= 0) #always true
filter = filter & (df.numA < 4) if 'numA' in df else filter
filter = filter & (df.numB < 2) if 'numB' in df else filter
filter = filter & (df.numC < 1) if 'numC' in df else filter
df[filter]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments