我有一个带有1分钟股票数据的CSV文件,该数据跨越了数天。每天从9:30到16:00。
时间序列中的某些分钟丢失了:(此处缺少2013-09-16 09:32:00和2013-09-17 09:31:00)
2013-09-16 09:30:00,461.01,461.49,461,461,183507
2013-09-16 09:31:00,460.82,461.6099,460.39,461.07,212774
2013-09-16 09:33:00,460.0799,460.88,458.97,459.2401,207880
2013-09-16 09:34:00,458.97,460.08,458.8,460.04,148121
...
2013-09-16 15:59:00,449.72,450.0774,449.59,449.95,146399
2013-09-16 16:00:00,450.12,450.12,449.65,449.65,444594
2013-09-17 09:30:00,448,448,447.5,447.96,173624
2013-09-17 09:32:00,450.6177,450.9,449.05,449.2701,268715
2013-09-17 09:33:00,451.39,451.96,450.58,450.7061,197019
...
...
对于大熊猫,我该如何向前填充系列,以便每分钟都在场?我应该看起来像这样:
2013-09-16 09:30:00,461.01,461.49,461,461,183507
2013-09-16 09:31:00,460.82,461.6099,460.39,461.07,212774
2013-09-16 09:32:00,460.82,461.6099,460.39,461.07,212774 <-- forward filled
2013-09-16 09:33:00,460.0799,460.88,458.97,459.2401,207880
2013-09-16 09:34:00,458.97,460.08,458.8,460.04,148121
...
2013-09-16 15:59:00,449.72,450.0774,449.59,449.95,146399
2013-09-16 16:00:00,450.12,450.12,449.65,449.65,444594
2013-09-17 09:30:00,448,448,447.5,447.96,173624
2013-09-17 09:31:00,448,448,447.5,447.96,173624 <-- forward filled
2013-09-17 09:32:00,450.6177,450.9,449.05,449.2701,268715
2013-09-17 09:33:00,451.39,451.96,450.58,450.7061,197019
...
它还需要考虑是否缺少多个连续分钟...
所以我将您的前4行复制到一个数据框中:
Out[49]:
0 1 2 3 4 5
0 2013-09-16 09:30:00 461.0100 461.4900 461.00 461.0000 183507
1 2013-09-16 09:31:00 460.8200 461.6099 460.39 461.0700 212774
2 2013-09-16 09:33:00 460.0799 460.8800 458.97 459.2401 207880
3 2013-09-16 09:34:00 458.9700 460.0800 458.80 460.0400 148121
然后
df1 = df.set_index(keys=[0]).resample('1min', fill_method='ffill')
df1
Out[52]:
1 2 3 4 5
0
2013-09-16 09:30:00 461.0100 461.4900 461.00 461.0000 183507
2013-09-16 09:31:00 460.8200 461.6099 460.39 461.0700 212774
2013-09-16 09:32:00 460.8200 461.6099 460.39 461.0700 212774
2013-09-16 09:33:00 460.0799 460.8800 458.97 459.2401 207880
2013-09-16 09:34:00 458.9700 460.0800 458.80 460.0400 148121
这还将处理多个缺失值并向前填充它们。
所以如果我有类似的数据
2013-09-17 09:30:00,448,448,447.5,447.96,173624
2013-09-17 09:33:00,451.39,451.96,450.58,450.7061,197019
并执行与之前相同的操作:
Out[55]:
1 2 3 4 5
0
2013-09-17 09:30:00 448.00 448.00 447.50 447.9600 173624
2013-09-17 09:31:00 448.00 448.00 447.50 447.9600 173624
2013-09-17 09:32:00 448.00 448.00 447.50 447.9600 173624
2013-09-17 09:33:00 451.39 451.96 450.58 450.7061 197019
这里的关键是,你必须有一个datetimeindex,如果你想保持它作为一列,那么你可以只设置drop=False
在set_index
。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句