I have
Apple f2 m Apple f2 t Apple f3 m Apple f3 t
0 3 4 5 3
1 12 7 4 7
2 5 9 7 5
3 3 3 4 8
4 7 1 2 6
I would like to select columns with str = 'Apple f* m' to do a t-test against columns with str = 'Apple f* t'
I have tried
ttest_ind(df.loc[:,df.columns.str.contains('Apple R* m')], df.loc[:,df.columns.str.contains('Apple R* t')]
However, it doesn't recognise my wildcard has a wildcard.
Thank you if you an help me solve or guide me for this problem.
For future reference. The pandas.Series.str.contains has the param regex set to True by default which means we can use Regex expressions.
To find 0 or more of any character we can simply use this (ref. Alan Moore)
.* just means "0 or more of any character"
It's broken down into two parts:
. - a "dot" indicates any character * - means "0 or more instances of the preceding regex token"
Here is a link to regex101 where you can test regex expressions:
https://regex101.com/r/QNjkch/1
And finally we can simplify your code, consider this simple example:
import pandas as pd
df = pd.DataFrame(columns=["a1a","a2a","a1b"])
mask = df.columns.str.contains('a.*a')
df.loc[:,mask] # selects mask
df.loc[:,~mask] # selects inverted (by using ~) mask
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments