将值从一个数据帧列传递到Pandas中的另一数据帧

debugcn 发表于 Dev

来

我有几个数据框。我想从第一个数据帧的2列中获取数据，以标记第二个数据帧中存在的行。第一个数据帧（df1）如下所示

Sup4 Seats  Primary Seats   Back up Seats
 Pa   3       2              1
 Ka   2       1              1
 Ga   1       0              1
 Gee  1       1              0
 Re   2       2              0

（df2）看起来像

Sup4    First   Last  Primary Seats     Backup Seats  Rating
Pa      Peter   He          NaN         NaN           2.3
Ka      Sonia   Du          NaN         NaN           2.99
Ga      Agnes   Bla         NaN         NaN           3.24
Gee    Jeffery  Rus         NaN         NaN           3.5
Gee    John     Cro         NaN         NaN           1.3
Pa     Pavol    Rac         NaN         NaN           1.99
Pa     Ciara    Lee         NaN         NaN           1.88
Re     David    Wool        NaN         NaN           2.34
Re     Stefan   Rot         NaN         NaN           2
Re     Franc    Bor         NaN         NaN           1.34
Ka     Tania    Le          NaN         NaN           2.35

我还需要对每个Sup4名称所需的输出进行分组，方法是将等级从最高到最低排序，然后根据df1列主要席位和备用席位标记席位列。

我为样品的第一个Sup4名称Pa进行了分组和排序，我必须对所有名称都做

Sup4    First   Last      Primary Seats   Backup Seats  Rating
Pa      Peter   He                  M                     2.3
Pa      Pavol   Rac                 M                     1.99
Pa      Ciara   Lee                           M           1.88
Ka      Sonia   Du                  M                     2.99
Ka      Tania   Le                            M           2.35
Ga      Agnes   Bla                           M           3.24
:
:
:

像这样继续

我一直尝试直到分组和排序

sorted_df = df2.sort_values(['Sup4','Rating'],ascending=[True,False])

但是我需要帮助传递df1列值以在第二个数据帧中进行标记

戴维·埃里克森

解决方案1：

您可以执行merge，但是需要包括一些逻辑来更新您的Seats列。另外，重要的是要提到您需要决定如何处理长度不相等的数据。〜啧啧andRe`有两个dataframes长度不等。解决方案2中的更多信息：

df3 = (pd.merge(df2[['Sup4', 'First', 'Last', 'Rating']], df1, on='Sup4')
         .sort_values(['Sup4', 'Rating'], ascending=[True, False]))
s = df3.groupby('Sup4', sort=False).cumcount() + 1
df3['Backup Seats'] = np.where(s - df3['Primary Seats'] > 0, 'M', '')
df3['Primary Seats'] = np.where(s <= df3['Primary Seats'], 'M', '')
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]: 
   Sup4    First  Last Primary Seats Backup Seats  Rating
5    Ga    Agnes   Bla                          M    3.24
6   Gee  Jeffery   Rus             M                  3.5
7   Gee     John   Cro                          M     1.3
3    Ka    Sonia    Du             M                 2.99
4    Ka    Tania    Le                          M    2.35
0    Pa    Peter    He             M                  2.3
1    Pa    Pavol   Rac             M                 1.99
2    Pa    Ciara   Lee                          M    1.88
8    Re    David  Wool             M                 2.34
9    Re   Stefan   Rot             M                  2.0
10   Re    Franc   Bor                          M    1.34

解决方案2：

完成此解决方案后，我意识到解决方案1会简单得多，但我想我也将其包括在内。此外，这还使您了解如何处理两个数据帧中大小不相等的值。您可以reindex使用第一个数据框，combine_first()但必须做一些准备。同样，您需要决定如何处理长度不相等的数据。在我的回答中，我只是排除Sup4了长度不相等的组，以确保在最终调用时索引对齐combine_first()：

# Purpose of `mtch` is to check if rows in second dataframe are equal to the count of seats in first.
# If not, then I have excluded the `Sup4` with unequal lengths in both dataframes
mtch = df1.groupby('Sup4')['Seats'].first().eq(df2.groupby('Sup4').size())
df1 = df1.sort_values('Sup4', ascending=True)[df1['Sup4'].isin(mtch[mtch].index)]
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)

#`reindex` the dataframe, get the cumulative count, and manipulate data with `np.where`
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)
s = df1.groupby('Sup4').cumcount() + 1
df1['Backup Seats'] = np.where(s - df1['Primary Seats'] > 0, 'M', '')
df1['Primary Seats'] = np.where(s <= df1['Primary Seats'], 'M', '')

#like df1, in df2 we exclude groups with uneven lengths and sort
df2 = (df2[df2['Sup4'].isin(mtch[mtch].index)]
       .sort_values(['Sup4', 'Rating'], ascending=[True, False]).reset_index(drop=True))

#can use `combine_first` since we have ensured that the data is sorted and of equal lengths in both dataframes
df3 = df2.combine_first(df1)

#order columns and only include required columns
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]: 
  Sup4  First Last Primary Seats Backup Seats  Rating
0   Ga  Agnes  Bla                          M    3.24
1   Ka  Sonia   Du             M                 2.99
2   Ka  Tania   Le                          M    2.35
3   Pa  Peter   He             M                  2.3
4   Pa  Pavol  Rac             M                 1.99
5   Pa  Ciara  Lee                          M    1.88

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-5

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

将值从一个数据帧列传递到Pandas中的另一数据帧

将值从一个数据帧列传递到Pandas中的另一数据帧

将数据从一个数据帧匹配到另一数据帧

如何将单元格值从一个数据帧复制到另一数据帧（不匹配的数据帧长度）

从一个数据帧到另一数据帧将行数据排列为柱状

将列数据从一个数据帧转置到另一数据帧

将外键从一个数据帧追加到另一数据帧的最快方法

将数据从一个数据帧复制到另一个数据帧，然后根据值替换数据

基于两列组合的匹配，将列从一个数据帧复制到另一数据帧

合并数据帧，以便将一个数据帧中的值插入到另一个数据帧中匹配的行号中

将行数据从一个数据帧排列到另一数据列

在遵守给定条件的同时，将列的值从一个数据帧插入到另一个数据帧

在遵守给定条件的同时，将列的值从一个数据帧插入到另一个数据帧

如何将一个数据帧中某些列的值与另一数据帧中同一列集的值进行比较？

使用setDT将一个数据帧中的许多列合并到另一数据帧中

将一个数据帧中的id字符替换为另一数据帧中的id字符

将值从一个数据帧替换为另一个

如果特定列的值在两个数据帧中都匹配，则将一个数据帧的行复制到另一数据帧

根据R中的两个匹配条件，将值从一个数据帧添加到另一个数据帧

将一个数据帧中与标签相对应的值乘以另一数据帧中具有相同标签的所有值

组合两个数据帧，以便一个数据帧中的值成为另一数据帧中的标头

组合两个数据帧，以便一个数据帧中的值成为另一数据帧中的标头

仅将数据帧中的新值附加到 Pandas 中的另一个数据帧

根据另一个数据帧中的值将数据帧的列相乘

将一个数据帧切片到另一个数据帧中

从一个数据帧到另一个数据帧的数据表操作

R根据其参考列将特定列从一个数据帧合并到另一数据帧

如何将一个数据帧的一行的EACH值与另一数据帧的一行的所有值相乘

一个数据框中的列总和基于另一数据帧的行值

将数据从一个数据帧拖入另一个

一个数据帧的聚合数据按键分组在另一数据帧中