我有一个数据集,如下所示:
.t0
和.t1
)this
和that
)1
,22
,22a
)v2
,v3
,ignore.t0
,ignore.t1
,this.t0
,this.t1
,that.t0
,that.t1
)。
dat <- data.frame(id = seq(from=1, to=10, by=1),
v2 = rnorm(10),
v3 = rnorm(10),
ignore.t0 = rnorm(10),
this.t0 = rnorm(10),
this1.t0 = rnorm(10),
this22.t0 = rnorm(10),
this22a.t0 = rnorm(10),
that.t0 = rnorm(10),
that1.t0 = rnorm(10),
that22.t0 = rnorm(10),
that22a.t0 = rnorm(10),
ignore.t1 = rnorm(10),
this.t1 = rnorm(10),
this1.t1 = rnorm(10),
this22.t1 = rnorm(10),
this22a.t1 = rnorm(10),
that.t1 = rnorm(10),
that1.t1 = rnorm(10),
that22.t1 = rnorm(10),
that22a.t1 = rnorm(10))
我想对数据框进行子集化,使其id
仅包含以下内容:
this
或that
)AND1.
)或数字和字母(22a.
)所以最后,数据框应如下所示:
dat2 <- data.frame(
id = seq(from=1, to=10, by=1),
#v2 = rnorm(10),
#v3 = rnorm(10),
#ignore.t0 = rnorm(10),
#this.t0 = rnorm(10),
this1.t0 = rnorm(10),
this22.t0 = rnorm(10),
this22a.t0 = rnorm(10),
#that.t0 = rnorm(10),
that1.t0 = rnorm(10),
that22.t0 = rnorm(10),
that22a.t0 = rnorm(10),
#ignore.t1 = rnorm(10),
#this.t1 = rnorm(10),
this1.t1 = rnorm(10),
this22.t1 = rnorm(10),
this22a.t1 = rnorm(10),
#that.t1 = rnorm(10),
that1.t1 = rnorm(10),
that22.t1 = rnorm(10),
that22a.t1 = rnorm(10))
数据框比此处显示的要大得多,因此无法键入列索引。它也不可能只认准规模名字,因为this.t0
,this.t1
,that.t0
,并that.t1
会被捕捉。
# not quite right
dat2$id <- dat$id
scales <- c("this", "that")
keep.index <- grep(paste(scales,collapse="|"), names(dat))
temp <- dat[keep.index]
dat2 <- cbind(dat2, temp)
如何修改grep模式以在句点前查找数字OR(数字和字符)?还是在一起有更好的方法?
这适用于您的示例:
dat[c("id", grep("(this|that)\\d+[a-z]?\\.", names(dat), value = TRUE))]
哪里:
\\d+
用于一位或多位数字[a-z]?
用于零个或一个小写字母\\.
为点如果您想为各种动态创建模式scales
,可以执行以下操作:
scales <- c("this", "that")
pattern <- sprintf("(%s)\\d+[a-z]?\\.", paste(scales, collapse = "|"))
dat[c("id", grep(pattern, names(dat), value = TRUE))]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句