如果我有一个充满文本和价格的数据框列。
0 £75 BT Reward Card
1 £125 BT Reward Card
2 £50 Retail Voucher
3 £100 BT Reward Card
4 £150 BT Reward Card
5 £50 Cashback
6 Fibre Connection Fee (£50 Credit
7 £75 BT Reward Card
8 £125 BT Reward Card
9 £50 Cashback
10 £0 Fibre Connection Fee (£50 Credit
我只想直接在 £ 符号后返回数字。
到目前为止我已经有了这个,但是对于索引 6 和 10 来说却分崩离析
df['col']=df['col'].apply(lambda x: x.split(' ') [0])
我也试过这个:
df['col']=df['col'].apply(lambda x: x.split('£') [1])
如果需要第一个值,仅extract
在必要时使用并转换为整数:
df['new'] = df['col'].str.extract('£(\d+)').astype(int)
print (df)
col new
0 £75 BT Reward Card 75
1 £125 BT Reward Card 125
2 £50 Retail Voucher 50
3 £100 BT Reward Card 100
4 £150 BT Reward Card 150
5 £50 Cashback 50
6 Fibre Connection Fee (£50 Credit 50
7 £75 BT Reward Card 75
8 £125 BT Reward Card 125
9 £50 Cashback 50
10 £0 Fibre Connection Fee (£50 Credit 0
如果列表中的所有值都使用str.findall
:
#values are strings
df['new'] = df['col'].str.findall('£(\d+)')
#values are integers
#df['new'] = df['col'].str.findall('£(\d+)').apply(lambda x: [int(y) for y in x])
print (df)
col new
0 £75 BT Reward Card [75]
1 £125 BT Reward Card [125]
2 £50 Retail Voucher [50]
3 £100 BT Reward Card [100]
4 £150 BT Reward Card [150]
5 £50 Cashback [50]
6 Fibre Connection Fee (£50 Credit [50]
7 £75 BT Reward Card [75]
8 £125 BT Reward Card [125]
9 £50 Cashback [50]
10 £0 Fibre Connection Fee (£50 Credit [0, 50]
如果在新列中需要它们,请使用extractall
with unstack
, add_prefix
and join
:
df = df.join(df['col'].str.extractall('£(\d+)')[0].unstack().astype(float).add_prefix('new'))
print (df)
col new0 new1
0 £75 BT Reward Card 75.0 NaN
1 £125 BT Reward Card 125.0 NaN
2 £50 Retail Voucher 50.0 NaN
3 £100 BT Reward Card 100.0 NaN
4 £150 BT Reward Card 150.0 NaN
5 £50 Cashback 50.0 NaN
6 Fibre Connection Fee (£50 Credit 50.0 NaN
7 £75 BT Reward Card 75.0 NaN
8 £125 BT Reward Card 125.0 NaN
9 £50 Cashback 50.0 NaN
10 £0 Fibre Connection Fee (£50 Credit 0.0 50.0
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句