pandas python根据条件将列的一部分插入到列中

Jessica 发表于 Dev

杰西卡（Jessica）

我有一个要使用的大型数据集，但在这里我使用的是模拟数据集：

data = {'Block': [1, 1, 1, 1, 1, 1,1,1,1],
    'Concentration': [100, 100, 100, 33, 33, 33,  0,0,0],
    'Name' : ['A', 'A',  'A', 'A', 'A', 'A', 'PB', 'PB', 'PB'],
    'value': [86, 194, 452, 140, 285, 2011, 100, 111, 222 ]}

data = DataFrame(data)

看起来像这样：

In [12]: data
Out[12]: 
     Block  Concentration Name  value
0      1            100    A     86
1      1            100    A    194
2      1            100    A    452
3      1             33    A    140
4      1             33    A    285
5      1             33    A   2011
6      1              0   PB    100
7      1              0   PB    111
8      1              0   PB    222

一共有24个区块，3种浓度类型和每个区块5个名称。

我想为每个块为每个名称添加3个新的“ 0”浓度，而不是名称“ PB”，然后将“ PB”中的值附加到新添加的“ 0”浓度中。

对于此处的模拟数据集，所需的输出将是：

In [13]: data2
Out[13]: 
      Block  Concentration Name  value
0       1            100    A     86
1       1            100    A    194
2       1            100    A    452
3       1             33    A    140
4       1             33    A    285
5       1             33    A   2011
6       1              0    A    100
7       1              0    A    111
8       1              0    A    222
9       1              0   PB    100
10      1              0   PB    111
11      1              0   PB    222

到目前为止，我的代码无法为每个块抓取“ PB”行：

def PBvalue(sgrp): 
    PBvalue = sgrp.loc[data['Name']=='PB'].copy()        
    return PBvalue
PBvalues = data.groupby(['Block', 'Concentration']).apply(PBvalue)

输出：

In [30]: PBvalues
Out[30]: 
                            Block  Concentration Name  value
 Block Concentration                                    
   1     0             6      1              0   PB    100
                       7      1              0   PB    111
                       8      1              0   PB    222

约瑟夫

这是代码：

# create the mock dataframe with 3 blocks

data1 = DataFrame({'Block': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
    'Concentration': [100, 100, 100, 33, 33, 33, 100, 100, 100, 33, 33, 33, 0,0,0],
    'Name' : ['A', 'A',  'A', 'A', 'A', 'A', 'B', 'B',  'B', 'B', 'B', 'B', 'PB', 'PB', 'PB'],
    'value': [86, 194, 452, 140, 285, 2011, 8, 19, 45, 14, 28, 201, 100, 111, 222 ]})


data2 = data1.copy(); data2.Block = 2
data3 = data1.copy(); data3.Block = 3

data = pd.concat([data1, data2, data3], axis=0)

def temp1(df):
    df_others = df[df.Name != 'PB']
    df_pb = df[df.Name == 'PB']
    def temp2(dfx):
        df_app = df_pb.copy()
        df_app = df_app[df_app.Concentration==0] # in case name 'PB' have more than one concentrations
        df_app['Name'] = dfx['Name'].values[0] ## modified code
        df_pername = pd.concat([dfx, df_app])
        return df_pername
    df1 = df_others.groupby('Name', group_keys=False).apply(temp2)
    df2 = pd.concat([df1, df_pb])
    return df2

data_changed = data.groupby('Block', group_keys=False).apply(temp1)

data_changed.index = range(len(data_changed))

In [151]: data_changed
Out[151]: 
    Block  Concentration Name  value
0       1            100    A     86
1       1            100    A    194
2       1            100    A    452
3       1             33    A    140
4       1             33    A    285
5       1             33    A   2011
6       1              0    A    100
7       1              0    A    111
8       1              0    A    222
9       1            100    B      8
10      1            100    B     19
11      1            100    B     45
12      1             33    B     14
13      1             33    B     28
14      1             33    B    201
15      1              0    B    100
16      1              0    B    111
17      1              0    B    222
18      1              0   PB    100
19      1              0   PB    111
20      1              0   PB    222
..    ...            ...  ...    ...
58      3              0    B    111
59      3              0    B    222
60      3              0   PB    100
61      3              0   PB    111
62      3              0   PB    222

[63 rows x 4 columns]