让我们先来看一下lm
。我有一个连续的解释$ X $和一个因子$ F $建模季节性方面(在示例8个级别中)。
假设$ \ beta $表示$ X $的斜率,那么我想对斜率与因子的相互作用进行建模。这是某种物理模型,因此我假设交互仅对8个级别中的2个有意义。如何制定呢?我想使用一个普通公式,稍后再将其放入AER
包(函数tobit
)中经过审查的回归中
数据为:
N = 50
f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
beta = rep(c(5,5,5,8,4,5,5,5),N)
set.seed(100)
x = rnorm(8*N)+1
epsilon = rnorm(8*N,sd = sqrt(1/5))
y = x*beta+fcoeff+epsilon
适合所有互动,提供准确的结果
fit <- lm(y~0+x+x*f)
summary(fit)
Call:
lm(formula = y ~ 0 + x + x * f)
Residuals:
Min 1Q Median 3Q Max
-1.41018 -0.30296 0.01818 0.32657 1.20677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 5.039064 0.075818 66.463 <2e-16 ***
fs1 -0.945112 0.088072 -10.731 <2e-16 ***
fs2 -2.107483 0.103590 -20.344 <2e-16 ***
fs3 -2.992401 0.088164 -33.941 <2e-16 ***
fs4 -4.054411 0.094878 -42.733 <2e-16 ***
fs5 -2.730448 0.094815 -28.798 <2e-16 ***
fs6 -5.232721 0.102254 -51.174 <2e-16 ***
fs7 -9.969175 0.096307 -103.515 <2e-16 ***
fs8 -4.922782 0.092917 -52.980 <2e-16 ***
x:fs2 -0.006081 0.097748 -0.062 0.950
x:fs3 -0.050684 0.102124 -0.496 0.620
x:fs4 2.988702 0.103652 28.834 <2e-16 ***
x:fs5 -1.196775 0.105139 -11.383 <2e-16 ***
x:fs6 0.099112 0.103811 0.955 0.340
x:fs7 -0.007648 0.110908 -0.069 0.945
x:fs8 -0.107148 0.094346 -1.136 0.257
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4705 on 384 degrees of freedom
Multiple R-squared: 0.9942, Adjusted R-squared: 0.994
F-statistic: 4120 on 16 and 384 DF, p-value: < 2.2e-16
如何与s4
和进行互动建模s5
?我可以从拟合中删除其他相互作用以进行进一步的预测吗?
我尝试将因子分解为2,但是模型变得单一:
f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
f2 = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
f[f %in% c("s4","s5")] <- "no.inter"
f2[f2 %in% c("s1","s2","s3","s6","s7","s8")] <- "rest"
fit <- lm(y~0+x+x*f2+ f)
summary(fit)
Call:
lm(formula = y ~ 0 + x + x * f2 + f)
Residuals:
Min 1Q Median 3Q Max
-1.41018 -0.31544 0.00653 0.31615 1.20670
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
x 5.01794 0.02756 182.106 <2e-16 ***
f2rest -5.02213 0.07381 -68.045 <2e-16 ***
f2s4 -4.05441 0.09495 -42.702 <2e-16 ***
f2s5 -2.73045 0.09488 -28.777 <2e-16 ***
fs1 4.09310 0.09480 43.177 <2e-16 ***
fs2 2.93401 0.09424 31.132 <2e-16 ***
fs3 2.00475 0.09456 21.201 <2e-16 ***
fs6 -0.07894 0.09419 -0.838 0.402
fs7 -4.93545 0.09452 -52.213 <2e-16 ***
fs8 NA NA NA NA
x:f2s4 3.00983 0.07591 39.651 <2e-16 ***
x:f2s5 -1.17565 0.07793 -15.086 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4709 on 389 degrees of freedom
Multiple R-squared: 0.9941, Adjusted R-squared: 0.994
F-statistic: 5983 on 11 and 389 DF, p-value: < 2.2e-16
最简单的方法可能是操纵模型矩阵以删除不需要的列:
xx <- model.matrix(y ~ 0 + x + x*f)
omit <- grep("[:]fs[^45]", colnames(xx))
xx <- xx[, -omit]
lm(y ~ 0 + xx)
输出:
Call:
lm(formula = y ~ 0 + xx)
Coefficients:
xxx xxfs1 xxfs2 xxfs3 xxfs4 xxfs5 xxfs6 xxfs7 xxfs8 xxx:fs4 xxx:fs5
5.018 -0.929 -2.088 -3.017 -4.054 -2.730 -5.101 -9.958 -5.022 3.010 -1.176
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句