我有几个数据框。我想从第一个数据帧的2列中获取数据,以标记第二个数据帧中存在的行。第一个数据帧(df1)如下所示
Sup4 Seats Primary Seats Back up Seats
Pa 3 2 1
Ka 2 1 1
Ga 1 0 1
Gee 1 1 0
Re 2 2 0
(df2)看起来像
Sup4 First Last Primary Seats Backup Seats Rating
Pa Peter He NaN NaN 2.3
Ka Sonia Du NaN NaN 2.99
Ga Agnes Bla NaN NaN 3.24
Gee Jeffery Rus NaN NaN 3.5
Gee John Cro NaN NaN 1.3
Pa Pavol Rac NaN NaN 1.99
Pa Ciara Lee NaN NaN 1.88
Re David Wool NaN NaN 2.34
Re Stefan Rot NaN NaN 2
Re Franc Bor NaN NaN 1.34
Ka Tania Le NaN NaN 2.35
我还需要对每个Sup4名称所需的输出进行分组,方法是将等级从最高到最低排序,然后根据df1列主要席位和备用席位标记席位列。
我为样品的第一个Sup4名称Pa进行了分组和排序,我必须对所有名称都做
Sup4 First Last Primary Seats Backup Seats Rating
Pa Peter He M 2.3
Pa Pavol Rac M 1.99
Pa Ciara Lee M 1.88
Ka Sonia Du M 2.99
Ka Tania Le M 2.35
Ga Agnes Bla M 3.24
:
:
:
像这样继续
我一直尝试直到分组和排序
sorted_df = df2.sort_values(['Sup4','Rating'],ascending=[True,False])
但是我需要帮助传递df1列值以在第二个数据帧中进行标记
解决方案1:
您可以执行merge
,但是需要包括一些逻辑来更新您的Seats
列。另外,重要的是要提到您需要决定如何处理长度不相等的数据。〜啧啧and
Re`有两个dataframes长度不等。解决方案2中的更多信息:
df3 = (pd.merge(df2[['Sup4', 'First', 'Last', 'Rating']], df1, on='Sup4')
.sort_values(['Sup4', 'Rating'], ascending=[True, False]))
s = df3.groupby('Sup4', sort=False).cumcount() + 1
df3['Backup Seats'] = np.where(s - df3['Primary Seats'] > 0, 'M', '')
df3['Primary Seats'] = np.where(s <= df3['Primary Seats'], 'M', '')
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]:
Sup4 First Last Primary Seats Backup Seats Rating
5 Ga Agnes Bla M 3.24
6 Gee Jeffery Rus M 3.5
7 Gee John Cro M 1.3
3 Ka Sonia Du M 2.99
4 Ka Tania Le M 2.35
0 Pa Peter He M 2.3
1 Pa Pavol Rac M 1.99
2 Pa Ciara Lee M 1.88
8 Re David Wool M 2.34
9 Re Stefan Rot M 2.0
10 Re Franc Bor M 1.34
解决方案2:
完成此解决方案后,我意识到解决方案1会简单得多,但我想我也将其包括在内。此外,这还使您了解如何处理两个数据帧中大小不相等的值。您可以reindex
使用第一个数据框,combine_first()
但必须做一些准备。同样,您需要决定如何处理长度不相等的数据。在我的回答中,我只是排除Sup4
了长度不相等的组,以确保在最终调用时索引对齐combine_first()
:
# Purpose of `mtch` is to check if rows in second dataframe are equal to the count of seats in first.
# If not, then I have excluded the `Sup4` with unequal lengths in both dataframes
mtch = df1.groupby('Sup4')['Seats'].first().eq(df2.groupby('Sup4').size())
df1 = df1.sort_values('Sup4', ascending=True)[df1['Sup4'].isin(mtch[mtch].index)]
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)
#`reindex` the dataframe, get the cumulative count, and manipulate data with `np.where`
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)
s = df1.groupby('Sup4').cumcount() + 1
df1['Backup Seats'] = np.where(s - df1['Primary Seats'] > 0, 'M', '')
df1['Primary Seats'] = np.where(s <= df1['Primary Seats'], 'M', '')
#like df1, in df2 we exclude groups with uneven lengths and sort
df2 = (df2[df2['Sup4'].isin(mtch[mtch].index)]
.sort_values(['Sup4', 'Rating'], ascending=[True, False]).reset_index(drop=True))
#can use `combine_first` since we have ensured that the data is sorted and of equal lengths in both dataframes
df3 = df2.combine_first(df1)
#order columns and only include required columns
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]:
Sup4 First Last Primary Seats Backup Seats Rating
0 Ga Agnes Bla M 3.24
1 Ka Sonia Du M 2.99
2 Ka Tania Le M 2.35
3 Pa Peter He M 2.3
4 Pa Pavol Rac M 1.99
5 Pa Ciara Lee M 1.88
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句