I have a DataFrame that looks like this:
product units_sold week
sku 5 W01
sku 3 W02
sku 2 W03
sku 4 W04
sku 6 W05
sku 5 W36
sku 3 W38
sku 2 W39
sku 4 W40
In the 'week', w37 is missing, and columns are in a wrong sequence:
I have a list with all needed rows and right sequence:
week_list = ['W36','W37','W38','W39','W40','W01','W02','W03','W04','W05']
desired output is:
product units_sold week
sku 5 W36
sku 0 W37
sku 5 W38
sku 2 W39
sku 4 W40
sku 6 W01
sku 3 W02
sku 2 W03
sku 4 W04
sku 6 W05
Where W37 is on place and 'unit_sold' is 0
Just add one column solution is not suited, as I have quite a big DataFrame, and there is probably can be other missing rows.
I tried with pd.sort_values and pd.categorical:
def sorter(column):
reorder = week_list
cat = pd.Categorical(column, categories=reorder, ordered=True)
return pd.Series(cat)
df.sort_values(by="week", key=sorter)
this helped me to set the right sequence, but 'w37' is still missing, so the problem is not solved.
Is there any way I can update a DataFrame rows values based on this list?
IIUC just use reindex
:
print (df.set_index("week").reindex(week_list).fillna({"product":"sku", "units_sold": 0}))
product units_sold
week
W36 sku 5.0
W37 sku 0.0
W38 sku 3.0
W39 sku 2.0
W40 sku 4.0
W01 sku 5.0
W02 sku 3.0
W03 sku 2.0
W04 sku 4.0
W05 sku 6.0
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments