将re函数应用于混合的熊猫数据框

debugcn 发表于 Dev

水库投资

具有以下数据框df：

   RID                  Other_aided Ultibro Relvar
0  701              {_12,_101,_102}    {_9}    NaN
1  702                 {_7,_11,_16}    {_7}    NaN
2  703  {_12,_101,_102,_10,_11,_16}    {_7}    NaN
3  704                  {_5,_3,_16}     NaN    NaN
4  705       {_101,_102,_10,_3,_16}    {_6}    NaN

想要通过以下方式清洁df：

{}_从数据列中删除它所在的位置。
NaN需要替换为NULL字符串''。
ints（RID）的第一个ID列需要保护。

做了以下函数f：

import re
f = lambda x: re.sub(r'[^0-9,]','', x)

运行：

df.Other_aided.apply(f) 对于具有适当数据的单列工作正常。
df.Ultibro.apply(f)，但由于NaN而df.Relvar.apply(f)失败。TypeError: expected string or bytes-like object
因此...将数据列转换为字符串的想法将对代码有所帮助df.iloc[:, 1:].apply(lambda y: f(str(y)), axis=1)。但这失败了，给出了不正确的输出……

0         175,9,10,3,11,1612,101,102810109918280,
1                    159,10,37,11,16710710717281,
...

df如何清理？

耶斯列尔

如果要使用您的函数，请首先将NaNs替换为空字符串，然后将其传递DataFrame.applymap给元素进行明智的处理：

f = lambda x: re.sub(r'[^0-9,]','', x)
df.iloc[:, 1:] = df.iloc[:, 1:].fillna('').applymap(f)
print (df)
   RID          Other_aided Ultibro Relvar
0  701           12,101,102       9       
1  702              7,11,16       7       
2  703  12,101,102,10,11,16       7       
3  704               5,3,16               
4  705      101,102,10,3,16       6

或使用DataFrame.replace：

df.iloc[:, 1:] = df.iloc[:, 1:].fillna('').replace(r'[^0-9,]','', regex=True)
print (df)
   RID          Other_aided Ultibro Relvar
0  701           12,101,102       9       
1  702              7,11,16       7       
2  703  12,101,102,10,11,16       7       
3  704               5,3,16               
4  705      101,102,10,3,16       6

#if never missing values in first column, so no repacing it to empty strings
df = df.fillna('').replace(r'[^0-9,]','', regex=True)
print (df)
   RID          Other_aided Ultibro Relvar
0  701           12,101,102       9       
1  702              7,11,16       7       
2  703  12,101,102,10,11,16       7       
3  704               5,3,16               
4  705      101,102,10,3,16       6

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。