我有这样的df,我想将值列表更改为column
```
uid device
0 000 [1.0, 3.0]
1 001 [3.0]
2 003 [nan]
3 004 [2.0, 3.0]
4 005 [1.0]
5 006 [1.0]
6 006 [nan]
7 007 [2.0]
```
应该
```
uid device NA just_1 just_2or3 Both
0 000 [1.0, 3.0] 0 0 0 1
1 001 [3.0] 0 0 1 0
2 003 [nan] 1 0 0 0
3 004 [2.0, 3.0] 0 0 "1" 0
4 005 [1.0] 0 1 0 0
5 006 [1.0] 0 1 0 0
6 006 [nan] 1 0 0 0
7 007 [2.0] 0 1 1 0
8 008 [1.0, 2.0] 0 0 0 1
```
我想更改为虚拟变量,如果设备仅1.0,则将对应的列值设置为1,如果2.0、3.0,[2.0,3.0],则将just_2or3设置为1。
仅当列表中的1.0(例如[1.0,3.0],[1.0,2.0])都设置为1时
我怎样才能做到这一点?谢谢你
您可以将自定义函数f
与列表推导结合使用,最后将boolean
值强制转换为int
by astype
:
df = pd.DataFrame({'uid':['000','001','002','003','004','005','006','007'],
'device':[[1.0,3.0],[3.0],[np.nan],[2.0,3.0],
[1.0],[1.0],[np.nan],[2.0]]})
print (df)
device uid
0 [1.0, 3.0] 000
1 [3.0] 001
2 [nan] 002
3 [2.0, 3.0] 003
4 [1.0] 004
5 [1.0] 005
6 [nan] 006
7 [2.0] 007
def f(x):
#print (x)
NA = [np.nan in x][0]
just_1 = [1 in x and not(2 in x or 3 in x)][0]
both = [1 in x and (2 in x or 3 in x)][0]
just_2or3 = [1 not in x and (2 in x or 3 in x)][0]
return pd.Series([NA, just_1, just_2or3, both],
index=['NA','just_1','just_2or3', 'both'])
print (df.set_index('uid').device.apply(f).astype(int).reset_index())
uid NA just_1 just_2or3 both
0 000 0 0 0 1
1 001 0 0 1 0
2 002 1 0 0 0
3 003 0 0 1 0
4 004 0 1 0 0
5 005 0 1 0 0
6 006 1 0 0 0
7 007 0 0 1 0
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句