我有一个每日销售的DataFrame。
我想创建一个KPI,以了解周末销售如何影响周销售以及正常的日子如何影响周销售。
我在下面创建了一个工作代码作为示例,它可以工作。
但:
2)(第二个小问题)如果银行假期的销售额过低或特殊假期的销售额过高,我应该用均值或中位数代替这些离群值吗?
#Generating a Random Sample
np.random.seed(30)
values=np.random.randint(0,400,14)
day= pd.date_range(start='25/02/2020',periods=14)
#Creating an example DataFrame
df_input={'date':day,"values":values}
df_test=pd.DataFrame(df_input)
#From here starts what I have done on my original code
# week and day of the week are two informatio that I have to retrieve from my
#original DataFrame
#Creating a column with Week and Day Of The Week info
df_test['week']=df_test.date.dt.week
df_test['day']=df_test.date.dt.dayofweek
#Creating a Column for differencing weekend from normal day
df_test['weekend']=np.zeros(len(df_test))
df_test['weekend'][(df_test.day==5)|(df_test.day==6)]=1
#Grouping on Week to get total sales
grouped_full=df_test.groupby(['week']).sum()
#Grouping on Week And Weekend to separate normal days from weekend days
grouped=df_test.groupby(['week','weekend']).sum()
#Getting the ratio between
#the weekend and total weekly sales
#normal days and total weekly sales
ratio=grouped/grouped_full
print(ratio['values'])
这是我的输出:
week weekend
9 0.0 0.666667
1.0 0.333333
10 0.0 0.784127
1.0 0.215873
11 0.0 1.000000
我花了一些时间试图真正了解您如何解决该问题以及是否有另一种解决方法。
我绝对不是专业人士,因此,将所有这些加一点盐:
import numpy as np
import pandas as pd
#Generating a Random Sample
np.random.seed(30)
values=np.random.randint(0,400,14)
day= pd.date_range(start='25/02/2020',periods=14)
#Creating an example DataFrame
df_input={'date':day,"values":values}
df_test=pd.DataFrame(df_input)
# group and use mean
weekdays = (df_test[df_test.date.dt.weekday.isin([0,1,2,3,4])]
.groupby(df_test.date.dt.week)
.mean())
# group and use mean
weekends = (df_test[df_test.date.dt.weekday.isin([5,6])]
.groupby(df_test.date.dt.week)
.mean())
# store in dict
d = {'weekdays': weekdays, 'weekends': weekends}
# concat and drop extra level('values')
both = pd.concat(d, axis=1).droplevel(1, axis=1)
# create new column with mean ratio differences
both['weekends_ratio'] = both['weekends'].div(both['weekends']+both['weekdays'])
print(both)
weekdays weekends weekend_ratio
date
9 280 280.0 0.500000
10 247 170.0 0.407674
11 18 NaN NaN
.mean()
(如您在我的代码中看到的)而不是.sum()
查找平均销售额中的比率差异。本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句