我们下面的数据集中有逐月数据,需要确定各月之间的百分比。
ID Jan 1 Feb 2 Mar 3
1 50 40 60 55 45 37
2 100 92 100 80 100 30
3 110 89 110 0 120 119
4 200 195 0 0 125 120
5 0 0 0 0 125 120
百分比需要计算为= 1 / Jan * 100如果该百分比低于90,则需要在结果列中将该列标记为逗号分隔。
预期结果:
ID Jan 1 %_1 Feb 2 %_2 Mar 3 %_3 Result
1 50 40 80 60 55 91.67 45 37 82.22 1,3
2 100 92 92 100 80 80 100 30 30 2,3
3 110 89 80.91 110 0 0 120 119 99.17 1,2
4 200 195 97.5 0 0 0 125 120 96 1,2
5 0 0 0 0 0 0 125 120 96 1,2
编辑:
L = {1: 91.0, 2: 105.0, 3: 96.0, 4: 126.0, 5: 125.0, 6: 139.0, 7: 120.0,
8: 145.0, 9: 116.0,
'Apr': 134.0, 'Aug': 150.0, 'Feb': 108.0, 'Jan': 91.0,
'Jul': 128.0, 'Jun': 147.0,
'Mar': 102.0, 'May': 134.0, 'Sep': 116.0, 'id': 494}
L1 = {1: 10.0, 2: 105.0, 3: 96.0, 4: 126.0, 5: 20.0, 6: 139.0, 7: 120.0, 8: 52.0, 9: 116.0,
'Apr': 134.0, 'Aug': 150.0, 'Feb': 108.0, 'Jan': 91.0, 'Jul': 128.0, 'Jun': 147.0,
'Mar': 102.0, 'May': 134.0, 'Sep': 12.0, 'id': 496}
df = pd.DataFrame([L, L1])
#convert id to index
df1 = df.set_index('id')
#test if columns names are months
mask = pd.to_datetime(df1.columns, format='%b', errors='coerce').notna()
#convert months to categoricals and sorting
df2 = df1.loc[:, mask]
cats = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df2.columns = pd.CategoricalIndex(df2.columns, categories=cats, ordered=True)
df2 = df2.sort_index(axis=1)
# print (df2)
#extract not months columns
df3 = df1.loc[:, ~mask]
# print (df3)
#VERY IMPORTANT TEST IF BOTH LENGTHS MATCH
print (len(df2.columns) == len(df3.columns))
#divide by df2 converted to numpy
df4 = df3.div(df2.to_numpy()).mul(100)
# print (df4)
#created new column by dot for matrix multiplication
res = df4.lt(90).dot(df4.columns.astype(str) + ',').str.strip(',')
# print (res)
#dict for replace columns names
d = dict(zip(df3.columns, '%_' + df2.columns.astype(str) + ' ' + df3.columns.astype(str) ))
# print (d)
#join together
df = pd.concat([df3, df4.rename(columns=d), res.rename('Result')], axis=1)
# print (df)
#change ordering
order = [i for x in df3.columns for i in (x, d[x])] + ['Result']
# print (order)
df = df[order]
print (df)
1 %_Jan 1 2 %_Feb 2 3 %_Mar 3 4 %_Apr 4 \
id
494 91.0 100.000000 105.0 97.222222 96.0 94.117647 126.0 94.029851
496 10.0 10.989011 105.0 97.222222 96.0 94.117647 126.0 94.029851
5 %_May 5 6 %_Jun 6 7 %_Jul 7 8 %_Aug 8 \
id
494 125.0 93.283582 139.0 94.557823 120.0 93.75 145.0 96.666667
496 20.0 14.925373 139.0 94.557823 120.0 93.75 52.0 34.666667
9 %_Sep 9 Result
id
494 116.0 100.000000
496 116.0 966.666667 1,5,8
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句