假设我有一个具有这种结构的数据集
pet_name doggo floofer puppo pupper
A None floofer None None
B doggo None None None
C None None puppo None
D None None None pupper
E doggo floofer None None
F None None puppo pupper
G None None None None
我想有一个新名称为dog_stage的列,其中包含变量(doggo,floofer,puppo,pupper)
最终结果将是这样
name dog_stage
A floofer
B doggo
C puppo
D pupper
E doggo, floofer
F puppo, pupper
G None
并删除列
对于这两种解决方案,仅过滤必要的列:
df = df[['name','doggo' , 'floofer', 'puppo', 'pupper']].copy()
第一个解决方案连接列名称如果不包含None
像Nonetype或类似串None
用DataFrame.dot
的矩阵乘法按列名:
#convert pet_name to index, if possible strings None replace and test not NaNs or not Nones
df1 = df.set_index('name').replace('None', np.nan).notna()
df1 = df1.dot(df1.columns + ',').str[:-1].reset_index(name='dog_stage')
print (df1)
name dog_stage
0 A floofer
1 B doggo
2 C puppo
3 D pupper
4 E doggo,floofer
5 F puppo,pupper
6 G
另一个想法是,如果不在None
lambda函数中,则连接每一行:
df1 = (df.set_index('name')
.replace('None', np.nan)
.apply(lambda x: ','.join(x.dropna()), axis=1)
.reset_index(name='dog_stage'))
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句