After struggling with a csv file encoding I decided to do the encoding heresy of manually replacing some characters.
This is how the dataframe looks:
df = pd.DataFrame({'a' : 'bÉd encoded',
'b' : ['foo', 'bar'] * 3,
'c' : 'bÉd encoded too'})
a b c
0 bÉd encoded foo bÉd encoded too
1 bÉd encoded bar bÉd encoded too
2 bÉd encoded foo bÉd encoded too
3 bÉd encoded bar bÉd encoded too
4 bÉd encoded foo bÉd encoded too
5 bÉd encoded bar bÉd encoded too
If my only problem was column 'a' this function would be enough:
def force_good_e(row):
col = row['a']
if 'É' in col:
col = col.replace('É','a')
return col
df['a'] = df.apply(force_good_e, axis=1)
But then I would need another function for column 'c'
I got an improvement with this:
def force_good_es(row, column):
col = row[column]
if 'É' in col:
col = col.replace('É','a')
return col
df['a'] = df.apply(lambda x: force_good_es(x,'a'), axis=1)
df['c'] = df.apply(lambda x: force_good_es(x,'c'), axis=1)
But it got me wondering, is there a better way to do this?
i.e. eliminating the need to make one line of
df[n] = df.apply(lambda x: force_good_es(x,n), axis=1)
for each n column that needs to be fixed.
You could use str.replace
df['a'] = df['a'].str.replace('É','a')
df['c'] = df['c'].str.replace('É','a')
or like @wen mentioned in comments.
df = df.replace({'É':'a'},regex=True)
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加