I'm trying to calculate sales ratio of retailer-sku across a period of weeks and then calculate the mean of retailer-sku across those weeks.
So far I've been able to calculate the sum of sales across weeks for sku's and then I have grouped the sales of retailer-sku across weeks.
Now I'm unable to find the way to calculate the ratio of sales across 'N' number of weeks of retailer sku.
Here is my code
score_period = [
[201636, 201643],
[201640, 201647],
[201645, 201652],
[201649, 201704],
[201701, 201708]
]
sku_group = df.groupby('Sku', as_index=False)
sku_list = sku_group.groups.keys()
for sku in sku_list:
df_sku = df[df['Sku'] == sku]
for period in score_period:
df_period = df_sku[(df_sku['Week'] >= period[0]) &
(df_sku['Week'] <= period[1])]
# sales of each week in period
df_sum = df_period.groupby(['Week'], as_index=False)['WeekSales'].sum()
# retailer sales sum per week
sums = df_period.groupby(['Week', 'RetailerCode'], as_index=False)['WeekSales'].sum()
for index, rows in sums.iterrows():
sums['ratio'] = sums['WeekSales'] / df_sum[(df_sum['Week'])]['WeekSales']
Data
sales = [
{'RetailerCode': 'RET001', 'Sku': 'SKU001', 'Week': 201636, 'WeekSales': 10},
{'RetailerCode': 'RET002', 'Sku': 'SKU002', 'Week': 201636, 'WeekSales': 20},
{'RetailerCode': 'RET003', 'Sku': 'SKU003', 'Week': 201636, 'WeekSales': 0},
{'RetailerCode': 'RET004', 'Sku': 'SKU004', 'Week': 201636, 'WeekSales': 10},
{'RetailerCode': 'RET001', 'Sku': 'SKU001', 'Week': 201637, 'WeekSales': 5},
{'RetailerCode': 'RET002', 'Sku': 'SKU002', 'Week': 201637, 'WeekSales': 10},
{'RetailerCode': 'RET003', 'Sku': 'SKU003', 'Week': 201637, 'WeekSales': 20},
{'RetailerCode': 'RET004', 'Sku': 'SKU004', 'Week': 201637, 'WeekSales': 3},
]
df = pd.DataFrame(sales)
Expected results:
RET001 avg ratio = (Ratio of first week + Ratio of second week) / 2
RET002 avg ratio = (Ratio of first week + Ratio of second week) / 2
At the last for-loop, you should access rows, not sums (whole table).
Because you are iterate through the tables, you can not add column simply by sum['ratio']
. You have to use sums.loc[index, 'ratio']
(Explanation of this can be found here)
To match the week in df_sum
and sums
, you need to do df_sum[df_sum['Week'] == rows['Week']
. This will return value of WeekSales
in df_sum
that matches Week
in current row.
Please check if the below code is what you are looking for.
score_period = [
[201636, 201643],
[201640, 201647],
[201645, 201652],
[201649, 201704],
[201701, 201708]
]
sku_group = df.groupby('Sku', as_index=False)
sku_list = sku_group.groups.keys()
sku_group = df.groupby('Sku', as_index=False)
sku_list = sku_group.groups.keys()
#for sku in sku_list:
# df_sku = df[df['Sku'] == sku]
for period in score_period:
df_period = df[(df['Week'] >= period[0]) & (df['Week'] <= period[1])]
# sales of each week in period
df_sum = df_period.groupby(['Week'], as_index=False)['WeekSales'].sum()
# retailer sales sum per week
sums = df_period.groupby(['Week', 'RetailerCode'], as_index=False)['WeekSales'].sum()
for index, rows in sums.iterrows():
sums.loc[index,'ratio'] = (rows['WeekSales']/df_sum[df_sum['Week']==rows['Week']]['WeekSales']).values
Result:
Week RetailerCode WeekSales ratio
0 201636 RET001 10 0.250000
1 201636 RET002 20 0.500000
2 201636 RET003 0 0.000000
3 201636 RET004 10 0.250000
4 201637 RET001 5 0.131579
5 201637 RET002 10 0.263158
6 201637 RET003 20 0.526316
7 201637 RET004 3 0.078947
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments