列名を削除して列の名前を変更する正規表現

debugcn 投稿 Dev

s_khan92

多くの列を持つdfがあり、調査データのため、各列には繰り返し値があります。例として、私のデータは次のようになります。

df：

 Q36r9: sales platforms - Before purchasing a new car         Q36r32: Advertising letters - Before purchasing a new car
        Not Selected                                                                         Selected

だから私は列名からテキストを取り除きたいです。たとえば、最初の列から「：」と「-」の間のテキストを取得したいと思います。したがって、次のようになります。「salesplatform」および2番目の部分では、列の値を変換したいので、「selected」を列の名前に変更し、「NotSelected」をNaNとして変更する必要があります。

したがって、必要な出力は次のようになります。

sales platforms                                       Advertising letters
      NaN                                             Advertising letters

編集：次のような列名がある場合の別の問題：

Q40r1c3: WeChat - Looking for a new car - And now if you think again  - Which social media platforms or sources would you use in each situation?

「：」と「-」の間に何かを入れたいだけの場合。「WeChat」を抽出する必要があります

マイコドリ

デイモン;

.*定義されたパターン間のすべてに一致する正規表現と貪欲な一致を利用できます

import re

df.columns = [re.search(':(.*)-',i).group(1) for i in df.columns.str.strip()]

print(df.columns)

   sales platforms   Advertising letters 
0      Not Selected                  None

編集：

貪欲なマッチングで使用できます +?

+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)

Q36r9: sales platforms - Before purchasing a new car    Q40r1c3: WeChat - Looking for a new car - And now if you think again - Which social media platforms or sources would you use in each situation?
0                                                       1


import re

[re.search(':(.+?)-',i).group(1).strip() for i in df.columns]

['sales platforms', 'WeChat']

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]