How to calculate ratio from two different pandas dataframe

user6083088 Published at Dev

user6083088

I'm trying to calculate sales ratio of retailer-sku across a period of weeks and then calculate the mean of retailer-sku across those weeks.

So far I've been able to calculate the sum of sales across weeks for sku's and then I have grouped the sales of retailer-sku across weeks.

Now I'm unable to find the way to calculate the ratio of sales across 'N' number of weeks of retailer sku.

Here is my code

score_period = [
        [201636, 201643],
        [201640, 201647],
        [201645, 201652],
        [201649, 201704],
        [201701, 201708]
    ]


    sku_group = df.groupby('Sku', as_index=False)
    sku_list = sku_group.groups.keys()

    for sku in sku_list:

        df_sku = df[df['Sku'] == sku]
        for period in score_period:
            df_period = df_sku[(df_sku['Week'] >= period[0]) &
                               (df_sku['Week'] <= period[1])]

            # sales of each week in period
            df_sum = df_period.groupby(['Week'], as_index=False)['WeekSales'].sum()
            # retailer sales sum per week
            sums = df_period.groupby(['Week', 'RetailerCode'], as_index=False)['WeekSales'].sum()

            for index, rows in sums.iterrows():
                sums['ratio'] = sums['WeekSales'] / df_sum[(df_sum['Week'])]['WeekSales']

Data

sales = [
    {'RetailerCode': 'RET001', 'Sku': 'SKU001', 'Week': 201636, 'WeekSales': 10},
    {'RetailerCode': 'RET002', 'Sku': 'SKU002', 'Week': 201636, 'WeekSales': 20},
    {'RetailerCode': 'RET003', 'Sku': 'SKU003', 'Week': 201636, 'WeekSales': 0},
    {'RetailerCode': 'RET004', 'Sku': 'SKU004', 'Week': 201636, 'WeekSales': 10},
    {'RetailerCode': 'RET001', 'Sku': 'SKU001', 'Week': 201637, 'WeekSales': 5},
    {'RetailerCode': 'RET002', 'Sku': 'SKU002', 'Week': 201637, 'WeekSales': 10},
    {'RetailerCode': 'RET003', 'Sku': 'SKU003', 'Week': 201637, 'WeekSales': 20},
    {'RetailerCode': 'RET004', 'Sku': 'SKU004', 'Week': 201637, 'WeekSales': 3},
]

df = pd.DataFrame(sales)

Expected results:

RET001 avg ratio = (Ratio of first week + Ratio of second week) / 2
RET002 avg ratio = (Ratio of first week + Ratio of second week) / 2

Tien Liang

Explanation

At the last for-loop, you should access rows, not sums (whole table).
Because you are iterate through the tables, you can not add column simply by sum['ratio']. You have to use sums.loc[index, 'ratio'] (Explanation of this can be found here)
To match the week in df_sum and sums, you need to do df_sum[df_sum['Week'] == rows['Week']. This will return value of WeekSales in df_sum that matches Week in current row.

Please check if the below code is what you are looking for.

score_period = [
    [201636, 201643],
    [201640, 201647],
    [201645, 201652],
    [201649, 201704],
    [201701, 201708]
]
sku_group = df.groupby('Sku', as_index=False)
sku_list = sku_group.groups.keys()


sku_group = df.groupby('Sku', as_index=False)
sku_list = sku_group.groups.keys()
#for sku in sku_list:
#  df_sku = df[df['Sku'] == sku]
for period in score_period:
    df_period = df[(df['Week'] >= period[0]) & (df['Week'] <= period[1])]

    # sales of each week in period
    df_sum = df_period.groupby(['Week'], as_index=False)['WeekSales'].sum()
    # retailer sales sum per week
    sums = df_period.groupby(['Week', 'RetailerCode'], as_index=False)['WeekSales'].sum()
    for index, rows in sums.iterrows():
        sums.loc[index,'ratio'] = (rows['WeekSales']/df_sum[df_sum['Week']==rows['Week']]['WeekSales']).values

Result:

     Week RetailerCode  WeekSales     ratio
0  201636       RET001         10  0.250000
1  201636       RET002         20  0.500000
2  201636       RET003          0  0.000000
3  201636       RET004         10  0.250000
4  201637       RET001          5  0.131579
5  201637       RET002         10  0.263158
6  201637       RET003         20  0.526316
7  201637       RET004          3  0.078947

Collected from the Internet

Please contact [email protected] to delete if infringement.