我正在研究一组输入变量和一个响应变量(价格)之间的相关性。这些都是按时间顺序排列的。
1)是否有必要在输入变量为周期性(自回归)的情况下对曲线进行平滑处理?如果是这样,怎么办?
2)建立关联后,我想准确地量化输入变量如何影响响应变量。例如:“一旦X增加> 10%,那么6个月后的y就会增加2%。”
我应该查看哪个python库来实现这一点-尤其是要弄清两个相关事件之间的滞后时间?
例:
I already looked at: statsmodels.tsa.ARMA but it seems to deal with predicting only one variable over time. In scipy the covariance matrix can tell me about the correlation, but does not help with figuring out the lag time.
While part of the question is more statistics based, the bit about how to do it in Python seems at home here. I see that you've since decided to do this in R from looking at your question on Cross Validated, but in case you decide to move back to Python, or for the benefit of anyone else finding this question:
I think you were in the right area looking at statsmodels.tsa, but there's a lot more to it than just the ARMA package:
http://statsmodels.sourceforge.net/devel/tsa.html
In particular, have a look at statsmodels.tsa.vector_ar for modelling multivariate time series. The documentation for it is available here:
http://statsmodels.sourceforge.net/devel/vector_ar.html
The page above specifies that it's for working with stationary time series - I presume this means removing both trend and any seasonality or periodicity. The following link is ultimately readying a model for forecasting, but it discusses the Box-Jenkins approach for building a model, including making it stationary:
http://www.colorado.edu/geography/class_homepages/geog_4023_s11/Lecture16_TS3.pdf
You'll notice that link discusses looking for autocorrelations (ACF) and partial autocorrelations (PACF), and then using the Augmented Dickey-Fuller test to test whether the series is now stationary. Tools for all three can be found in statsmodels.tsa.stattools. Likewise, statsmodels.tsa.arma_process has ACF and PACF.
The above link also discusses using metrics like AIC to determine the best model; both statsmodels.tsa.var_model and statsmodels.tsa.ar_model include AIC (amongst other measures). The same measures seem to be used for calculating lag order in var_model, using select_order.
In addition, the pandas library is at least partially integrated into statsmodels and has a lot of time series and data analysis functionality itself, so will probably be of interest. The time series documentation is located here:
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句