In a pandas df, I have a column ['name']
with various Operating System classifications such as 'Windows 7', 'Windows 10', 'Linux', 'Mobile iOS 9.1', 'OS X 10.12'
, etc. That are strings.
I am hoping to use this function to create a new column ['type']
that will be a more generalized version:
def name_group(row):
if 'Windows' in row:
name = 'Microsoft Windows'
elif 'iOS' in row:
name = 'Apple iOS'
elif 'OS X' in row:
name ='Apple Macintosh'
elif 'Macintosh' in row:
name = 'Apple Macintosh'
elif 'Linux' in row:
name = 'GNU/Linux'
else:
name = 'Other'
return name
It works correctly when I test the function by passing in a single string variable, but for some reason when I apply the function to the df like this, it only returns "other" for each row.
new_df['type'] = new_df.apply(name_group, axis=1)
Any thoughts on what could be causing this?
You need pass column name
with Series.apply
:
new_df['type'] = new_df['name'].apply(name_group)
But if want use DataFrame.apply
then need lambda
function and pass name of column too:
new_df['type'] = new_df.apply(lambda x: name_group(x['name']), axis=1)
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments