I'm trying to manipulate a DataFrame with 8 columns and 263.000 rows.
This is how to look my DF:
ID1 ID2 dN dS t Label_ID1 Label_ID2 Group
QJY77946 NP_073551 0.0241 0.1402 0.1479 229E-CoV 229E-CoV Intra
QJY77954 NP_073551 0.0119 0.0912 0.0870 229E-CoV 229E-CoV Intra
QJY77954 QJY77946 0.0119 0.0439 0.0566 229E-CoV 229E-CoV Intra
QJY77962 NP_073551 0.0119 0.0912 0.0870 229E-CoV 229E-CoV Intra
QJY77962 QJY77946 0.0119 0.0439 0.0566 229E-CoV 229E-CoV Intra
My goal is filter all the values <= 6 in the columns "dN", "dS" and "t". To make this I filter the rows when the values in any columns select (dN, dS and t) have a value <= 6.
df_1_S = pd.read_csv("S_YN00.csv",sep="\t", names=['ID1',"ID2","dN","dS","t","Label_ID1","Label_ID2","Group"])
S_greather_than = (df_1_S["dN"] < 6)
df_1_S.loc[S_greather_than]
This it works, but when I trying add more columns (dS and t):
S_greather_than = (df_1_S["dN"] < 6) & (df_1_S["dS"] < 6) & (df_1_S["t"] < 6)
df_1_S.loc[S_greather_than]
different method: using or ( | )
S_greather_than = ((df_1_S["dN"] < 6) | (df_1_S["dS"] < 5) | (df_1_S["t"] < 6))
df_1_S.loc[S_greather_than]
Happens this error:
TypeError: '<' not supported between instances of 'str' and 'int'
I understand the problem but I don´t know how filter the rows with values <= 6 at the same time.
Any idea or help is welcome.
Thank!
Change data type of column 'dS' to float as follows:
df_1_S['dS'] = df_1_S['dS'].astype(float)
The error you are getting is probably because 'dS' column is of type 'object' as mentioned in the comments.
Your code should work fine with this change.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments