Pythonデータフレームで複数の列を行に変換/ピボット解除します

debugcn 投稿 Dev

シャシャンク

複数の行にピボット解除する必要があるデータセットがあります。

例えば：

id  cor_id1 mail11 mail12 mail13 cor_id2 mail21 mail22 mail23 cor_id3 mail31 mail32 mail33
1    1      a@123  b@234  c@123    2     a@def  b@fgh   c@asd   3      s@wer  b@ert  e@rty
2    4      e@234  e@234  e@qwe    9     e@dfe  f@jfg   r@ert   10     e@wer  g@wer  e@ert

私はそれらを次のようにピボット解除する必要があります

id cor_id mail
1   1     a@123
1   1     b@234
1   1     c@123
1   2     a@def
1   2     b@fgh
1   2     c@asd
1   3     s@wer
1   3     b@ert
1   3     e@rty
2   4     e@234
2   4     e@234
2   4     e@qwe
2   9     e@dfe
2   9     r@ert
2   10    e@wer
2   10    g@wer
2   10    e@ert

df.meltを試しましたが、1列しか表示されません。

データに行に変換される複数の列がある場合はどうなりますか。

id  cor_id1 ad1 mail11 mail12 mail13 cor_id2 ad2 mail21 mail22 mail23 cor_id3 ad3 mail31 mail32 mail33
1    1     23    a@123  b@234  c@123        2   24  a@def  b@fgh   c@asd      3   25   s@wer  b@ert  e@rty
2    4     33    e@234  e@234  e@qwe        9   34 e@dfe  f@jfg   r@ert      10  35    e@wer  g@wer  e@ert

そして私は欲しい

id cor_id  ad  mail
1   1      23  a@123
1   1      23 b@234
1   1      23 c@123
1   2      24 a@def
1   2      24  b@fgh
1   2      24 c@asd
1   3      25 s@wer
1   3      25 b@ert
1   3      25 e@rty
2   4      33 e@234
2   4      33 e@234
2   4      33 e@qwe
2   9      34 e@dfe
2   9      34 f@jfg
2   9      34 r@ert
2   10     35 e@wer
2   10     35 g@wer
2   10     35 e@ert

ジェズリール

を使用しますwide_to_longが、最初にcor_id、最後の桁を追加する列の列名を変更する必要があります。

df = df.rename(columns=lambda x: x + x[-1] if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.reset_index(level=1, drop=True).reset_index()

別の方法は、0不足している行をdropna次のように追加および削除することです。

df = df.rename(columns=lambda x: x + '0' if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()

print (df)
    id  cor_id   mail
0    1     1.0  a@123
1    1     1.0  b@234
2    1     1.0  c@123
3    1     1.0  a@def
4    1     2.0  b@fgh
5    1     2.0  s@wer
6    1     2.0  b@ert
7    1     3.0  e@rty
8    1     3.0  c@asd
9    2     4.0  e@234
10   2     4.0  e@234
11   2     4.0  e@qwe
12   2     4.0  e@dfe
13   2     9.0  f@jfg
14   2     9.0  e@wer
15   2     9.0  g@wer
16   2    10.0  e@ert
17   2    10.0  r@ert

EDIT：などの複数の列がある場合cor_idのみにすることにより、テストのためのタプルに追加startswithし、その後ですべての列によって前方充填を変更するlistとffill：

df = df.rename(columns=lambda x: x + '0' if x.startswith(('cor_id','ad')) else x)
df = pd.wide_to_long(df, ['cor_id', 'ad','mail'], i='id', j='i')
df[['cor_id','ad']] = df[['cor_id','ad']].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()
print (df)
    id  cor_id    ad   mail
0    1     1.0  23.0  a@123
1    1     1.0  23.0  b@234
2    1     1.0  23.0  c@123
3    1     2.0  24.0  a@def
4    1     2.0  24.0  b@fgh
5    1     2.0  24.0  c@asd
6    1     3.0  25.0  s@wer
7    1     3.0  25.0  b@ert
8    1     3.0  25.0  e@rty
9    2     4.0  33.0  e@234
10   2     4.0  33.0  e@234
11   2     4.0  33.0  e@qwe
12   2     9.0  34.0  e@dfe
13   2     9.0  34.0  f@jfg
14   2     9.0  34.0  r@ert
15   2    10.0  35.0  e@wer
16   2    10.0  35.0  g@wer
17   2    10.0  35.0  e@ert

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]