My data look like:
A B C Month
0 1 3 5 Jan
1 1 2 3 Feb
I need to: a) convert 'Month' to dummies
df2 = pd.get_dummies(df,columns=['Month'],drop_first=True,prefix = 'm')
b) Multiply A / B / C with all dummies generated. The only way I can think of doing this is
df_Feb = df2[['A','B','C']].multiply(df2['m_Feb], axis = "index")
df_March
...
and then join all newly created dataframe, which isn't very convenient. Is there is better way to approach this
Idea is create MultiIndex
in both DataFrame
s by MultiIndex.from_product
and DataFrame.reindex
, so possible multiple each other:
df1 = df[['A','B','C']]
df2 = pd.get_dummies(df['Month'])
mux = pd.MultiIndex.from_product([df1.columns, df2.columns])
df2 = df2.reindex(mux, axis=1, level=1)
df1 = df1.reindex(mux, axis=1, level=0)
df = df1 * df2
Last for correct ordering is used ordered CategoricalIndex
and last flatten data columns with f-string
s:
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
df.columns = pd.MultiIndex.from_arrays([
df.columns.get_level_values(0),
pd.CategoricalIndex(df.columns.get_level_values(1),categories=months,ordered=True),
])
df = df.sort_index(axis=1)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
print (df)
A_Jan A_Feb B_Jan B_Feb C_Jan C_Feb
0 1 0 3 0 5 0
1 0 1 0 2 0 3
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加