我想根据前一周的销售额创建另一列。这是示例输入:
df = pd.DataFrame({'Week':[1,1,2,2,3,3,4,4,5,5,1,1,2,2,3,3,4,4,5,5],
'Category':['Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White'],
'id':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2],
'Sales':[100,200,300,400,100,200,300,400,100,200,100,200,300,400,100,200,300,400,100,200],
'Sales_others':[10,20,30,40,10,20,30,40,10,20,10,20,30,40,10,20,30,40,10,20]})
print(df)
基于此,我想创建另一列,不过是前一周的销售额。这是所需输出的样本
df_output = pd.DataFrame({'Week':[1,1,2,2,3,3,4,4,5,5,1,1,2,2,3,3,4,4,5,5],
'Category':['Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White','Red','White'],
'id':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2],
'Sales':[100,200,300,400,100,200,300,400,100,200,100,200,300,400,100,200,300,400,100,200],
'Sales_others':[10,20,30,40,10,20,30,40,10,20,10,20,30,40,10,20,30,40,10,20],
'Sales_previous_week':[0,0,100,200,300,400,100,200,300,400,0,0,100,200,300,400,100,200,300,400]})
print(df_output)
我发现很难创建自我连接。前一周应该只受销售文件的影响,我应该能够保留“ sales_others”列
-编辑添加原始代码
CR_UK_NL_Weeklevel['PREVIOUS_WEEK'] = CR_UK_NL_Weeklevel.groupby(['RETAIL_SITE_ID','CATEGORY_NAME'])['CURRENT_WEEK'].shift(fill_value=0)
print(CR_UK_NL_Weeklevel)
重命名列
CR_UK_NL_Weeklevel.columns.values[4] = 'CURRENT_WEEK'
CR_UK_NL_Weeklevel.columns.values[3] = 'LAST_YEAR_WEEK'
CR_UK_NL_Weeklevel.columns.values
尝试实施解决方案:
CR_UK_NL_Weeklevel['PREVIOUS_WEEK'] = CR_UK_NL_Weeklevel.groupby(['RETAIL_SITE_ID','CATEGORY_NAME'])['CURRENT_WEEK'].shift(fill_value=0)
print(CR_UK_NL_Weeklevel)
[78]:
CR_UK_NL_Weeklevel['PREVIOUS_WEEK'] = CR_UK_NL_Weeklevel.groupby(['RETAIL_SITE_ID','CATEGORY_NAME'])['CURRENT_WEEK'].shift(fill_value=0)
print(CR_UK_NL_Weeklevel)
- 错误
-------------------------------------------------- ------------------------- ----> 1中的KeyError跟踪(最近一次通话最后一次)= 1 CR_UK_NL_Weeklevel ['PREVIOUS_WEEK'] = CR_UK_NL_Weeklevel.groupby (['RETAIL_SITE_ID','CATEGORY_NAME'])['CURRENT_WEEK']。shift(fill_value = 0)2打印(CR_UK_NL_Weeklevel)〜\ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ pandas \ core \ base。 py in getitem(自身,密钥)273其他:274如果密钥不在self.obj中:-> 275提高KeyError(“未找到列:{key}”。format(key = key))276返回self._gotitem(键,ndim = 1)277 KeyError:“未找到列:CURRENT_WEEK”
如果每周和连续几周总有相同的类别,请使用DataFrameGroupBy.shift
按Category
列分组:
df['Sales_PREVIOUS'] = df.groupby('Category')['Sales'].shift(fill_value=0)
print (df)
Week Category Sales Sales_PREVIOUS
0 1 Red 100 0
1 1 White 200 0
2 2 Red 300 100
3 2 White 400 200
4 3 Red 100 300
5 3 White 200 400
6 4 Red 300 100
7 4 White 400 200
8 5 Red 100 300
9 5 White 200 400
透视的另一个想法是use DataFrame.pivot
,然后DataFrame.shift
使用DataFrame.stack
for Series
,最后通过DataFrame.join
以下方式添加新列:
s = df.pivot('Week','Category','Sales').shift(fill_value=0).stack()
df = df.join(s.rename('Sales_PREVIOUS WEEK'), on=['Week','Category'])
编辑:
随着新数据添加列id
:
df['Sales_PREVIOUS'] = df.groupby(['id','Category'])['Sales'].shift(fill_value=0)
对于第二个解决方案:
s = df.set_index(['Week','id','Category'])['Sales'].unstack([1,2]).shift(fill_value=0).unstack()
df = df.join(s.rename('Sales_PREVIOUS WEEK'), on=['id','Category','Week'])
print (df)
Week Category id Sales Sales_others Sales_PREVIOUS WEEK
0 1 Red 1 100 10 0
1 1 White 1 200 20 0
2 2 Red 1 300 30 100
3 2 White 1 400 40 200
4 3 Red 1 100 10 300
5 3 White 1 200 20 400
6 4 Red 1 300 30 100
7 4 White 1 400 40 200
8 5 Red 1 100 10 300
9 5 White 1 200 20 400
10 1 Red 2 100 10 0
11 1 White 2 200 20 0
12 2 Red 2 300 30 100
13 2 White 2 400 40 200
14 3 Red 2 100 10 300
15 3 White 2 200 20 400
16 4 Red 2 300 30 100
17 4 White 2 400 40 200
18 5 Red 2 100 10 300
19 5 White 2 200 20 400
编辑:
问题在于列名称,请使用:
cols = CR_UK_NL_Weeklevel.columns.tolist()
cols[4] = 'CURRENT_WEEK'
cols[3] = 'LAST_YEAR_WEEK'
CR_UK_NL_Weeklevel.columns = cols
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句