I am incredibly new to pandas python module and have a problem I'm trying to solve. Take the following dataframe as an example. This was read in from a .csv where "link" is the column header for the last three columns:
summary link link.1 link.2
0 test PCR-12345 PCR-54321 PCR-65432
1 test2 NaN NaN NaN
2 test3 DR-1234 PCR-1244 NaN
3 test4 PCR-4321 DR-4321 NaN
My goal is to update the dataframe to the following:
summary link link.1 link.2
0 test NaN NaN NaN
1 test2 NaN NaN NaN
2 test3 DR-1234 NaN NaN
3 test4 NaN DR-4321 NaN
So the criteria is basically, if the column header is "link.X" AND the value contains a string that starts with "PCR-", update it to an empty/NaN value.
How do I loop through each row's values, check the header and value, and replace if criteria is satisfied?
Let's try pd.Series.str.startswith
and pd.Series.mask
:
# columns starting with `link`
cols = df.columns[df.columns.str[:4]=='link']
# for each `link` column, mask the `PCR` with `NaN`:
df[cols] = df[cols].apply(lambda x: x.mask(x.str.startswith('PCR')==True) )
Output:
summary link link.1 link.2
0 test NaN NaN NaN
1 test2 NaN NaN NaN
2 test3 DR-1234 NaN NaN
3 test4 NaN DR-4321 NaN
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments