I create two dataframes:
data = [['John'], ['Mary']]
df1 = pd.DataFrame(data, columns = ['Name'])
df1['Height'] = 0
data = [['John', 5], ['Mary', 6]]
df2 = pd.DataFrame(data, columns = ['Name', 'Height'])
df1
Output:
Name Height
0 John 0
1 Mary 0
df2
Output:
Name Height
0 John 5
1 Mary 6
Now I try to fill in df1's Height using the values from df2:
df1['Height'] = df1.apply(lambda row: df2[df2.Name == row.Name]['Height'], axis = 1)
df1
Output:
Name Height
0 John 5
1 Mary Nan
Why does only the first name (John) have the Height filled in? Shouldn't apply() be iterating through all the rows of the df1 and returning the Height from df2 where df2 matches the name in the current row of df1?
The problem is that df2[df2.Name == row.Name]['Height']
returns a series with different indexes. You when Pandas concatenate these series, it yields different columns. In particular:
df1.apply(lambda row: df2[df2.Name == row.Name]['Height'], axis = 1)
returns:
0 1
0 5.0 NaN
1 NaN 6.0
and it looks like Pandas takes the first column to assign when you do:
df['Height'] = ...
To fix your code, you need to extract the single value:
df1['Height'] = df1.apply(lambda row: df2[df2.Name == row.Name]['Height'].iloc[0], axis = 1)
However, this is certainly not the best way to approach the problem. You should either take a look at map
or merge
. For example:
df1['Height'] = df1['Name'].map(df2.set_index('Name')['Height'])
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加