获得多年平均数

debugcn 发表于 Dev

马特·皮珀（Matt Pieper）

我正在尝试查找每个月中“ JOBS”的平均数量。例如，5月（2011年5月+ 2012年5月+ 2013年5月+ 2020年5月），6月等月创造的平均就业机会是多少？

NAME CLOSEDATE  JOBS  month year
A    2019-01-01 2     1     2019
B    2019-01-01 23    1     2019
C    2018-05-24 2     5     2018
D    2019-05-23 200   5     2019
E    2020-05-23 40    5     2020
F    2020-05-14 23    5     2020
G    2020-06-12 93    6     2020

我尝试过：pd.pivot_table(proj, index=['month'],values=['JOBS'],aggfunc=[np.sum,np.mean])这给了我每个月平均每条记录的工作，而不是总月份的平均数。

在上面的样本数据集中，理想情况下，我将在5月份获得66.25个工作的结果。（2 + 200 + 40 + 23）/ 4

我觉得我缺少一些简单的东西，或一种将表格格式化为的方式：

Year Jan   Feb   Mar ..... Dec
2011 1000  4322  5322      2343
2012 3423  4322  5322      2343
...  1645  4322  5322      2343
2020 7895  3432  9999      2343
AVG. 3491  4099  6491

里奇

更新资料

首先获取每年/每月的总和，然后将其取平均值。

by_month = (
    proj.groupby([proj.CLOSEDATE.dt.year, proj.CLOSEDATE.dt.month]) # create the groupby object
    .JOBS.sum() # select only the JOBS column and aggregate by sum
    .unstack(0) # drop the 'year' level form MultiIndex and use as columns
    .mean(axis=1) # areage across the years we just unstacked to axis1
    .rename('avg_jobs')
    .rename_axis('month')
)

print(by_month)
month
1    25.000000
5    88.333333
6    93.000000
Name: avg_jobs, dtype: float64

这将为您提供按月（跨年度和全称）的平均职位总数。请注意，您可以跳过为年/月创建单独的列的步骤，仅在您希望继续使用它们进行其他计算时才将它们放入。

by_month = (
    proj.groupby('month') # create the groupby object
    .JOBS.mean() # select only the JOBS column and aggregate by mean
)

print(by_month)
month
1    12.50
5    66.25
6    93.00
Name: JOBS, dtype: float64

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。