複数レベルの集計合計をデータフレームから時系列列に取得する方法

debugcn 投稿 Dev

ダンボ

さまざまな階層レベルで月次カウントがあるパンダデータフレームがあります。これは長い形式であり、集計の各レベルの列を持つ広い形式に変換したいと思います。

次の形式です。

date | country | state | county | population 
01-01| cc1     | s1    | c1     | 5
01-01| cc1     | s1    | c2     | 4
01-01| cc1     | s2    | c1     | 10
01-01| cc1     | s2    | c2     | 11
02-01| cc1     | s1    | c1     | 6
02-01| cc1     | s1    | c2     | 5
02-01| cc1     | s2    | c1     | 11
02-01| cc1     | s2    | c2     | 12
.
.

これを次の形式に変換したいと思います。

date | country_pop| s1_pop | s2_pop| .. | s1_c1_pop | s1_c2_pop| s2_c1_pop | s2_c2_pop|..

01-01| 30         | 9      | 21    | ...| 5         | 4        | 10         | 11        |..
02-01| 34         | 11     | 23    | ...| 6         | 5        | 11         | 12        |..
.
.

状態の総数は、4、s1 .... s4です。

また、各州の郡にはc1 .... c10というラベルを付けることができます（州によってはそれより少ない場合があり、それらの列をゼロにします。）

集計の各レベルで、日付順に時系列を取得したいと思います。どうすればこれを入手できますか？

スコットボストン

この方法で、レベルパラメータを指定したsumと、すべてのデータフレームを一緒にpd.concatを使用して実行しましょう。

#Aggregate to lowest level of detail
df_agg = df.groupby(['country', 'date', 'state', 'county'])[['population']].sum()

#Reshape dataframe and flatten multiindex column header
df_county = df_agg.unstack([-1, -2])
df_county.columns = [f'{s}_{c}_{p}' for p, c, s in df_county.columns]

#Sum to next level of detail and reshape
df_state = df_agg.sum(level=[0, 1, 2]).unstack()
df_state.columns = [f'{s}_{p}' for p, s in df_state.columns]

#Sum to country level 
df_country = df_agg.sum(level=[0, 1])

#pd.concat horizontally with axis=1
df_out = pd.concat([df_country, df_state, df_county], axis=1).reset_index()

出力：

  country   date  population  s1_population  s2_population  s1_c1_population  \
0     cc1  01-01          30              9             21                 5   
1     cc1  02-01          34             11             23                 6   

   s1_c2_population  s2_c1_population  s2_c2_population  
0                 4                10                11  
1                 5                11                12

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]