在我的数据集中,我有10列填充CR,II和RAND。我想创建另外2列,一个列提取没有“ RAND”作为其值的第一个块的值,另一列跟踪该块号。例如,某个ID可能在block_1和block_2中具有“ RAND”,在block_3中具有“ II”,我希望第一新列具有值“ II”,第二新列具有值“ 3”。我该如何实现?
下面是一个示例数据框。
set.seed(2288)
dff<-data.frame(replicate(10,sample(c("II", "RAND","CR"),10,rep=TRUE)))
myFun<- function(n = 5000) {
a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}
dff$ID<-myFun(10)
dff<-dff[,c(11, 1:10)]
dff[2:11] <- sapply(dff[2:11],as.factor)
dff<-data.table::setnames(dff, old = c('X1','X2','X3','X4', 'X5','X6','X7','X8','X9','X10'), new = c('block_1','block_2','block_3','block_4', 'block_5','block_6','block_7','block_8','block_9','block_10'))
func <- function(..., val = "0") {
dat <- unlist(list(...))
ind <- which.max(dat != val)
list(dat[[ind]], ind)
}
setNames(
do.call(rbind.data.frame, do.call(Map, c(list(f=func), dff[,-1]))),
c("val", "ind"))
# val ind
# 2 2 1
# 1 1 1
# 11 1 1
# 21 2 1
# 0 1 2
# 12 1 1
# 01 1 2
# 02 2 3
# 22 2 1
# 03 2 4
(可以cbind
编辑该数据。)
或者,先查找索引,然后再检索值。也许更清洁的方法?
do.call(mapply, c(list(FUN=func), dff[,-1]))
# 2.block_1 1.block_1 1.block_1 2.block_1 0.block_2 1.block_1 0.block_2 0.block_3 2.block_1 0.block_4
# 1 1 1 1 2 1 2 3 1 4
dff$ind <- do.call(mapply, c(list(FUN=func), dff[,-1]))
dff$val <- dff[,-1][cbind(seq_len(nrow(dff)), dff$ind)]
dff
# ID block_1 block_2 block_3 block_4 block_5 block_6 block_7 block_8 block_9 block_10 ind val
# 1 CIKJH1554S 2 1 0 0 2 2 1 2 1 1 1 2
# 2 URADX4138B 1 1 2 1 1 1 2 0 0 2 1 1
# 3 BWYCA9574K 1 0 1 1 2 1 2 1 1 1 1 1
# 4 FKBFM4773W 2 0 0 1 1 1 2 1 0 1 1 2
# 5 LTTTI7549S 0 1 0 1 1 0 2 2 1 2 2 1
# 6 OJDSI8401L 1 1 1 2 2 0 0 1 0 0 1 1
# 7 IAUKO4799A 0 1 0 1 1 1 1 2 0 2 2 1
# 8 WBJPE0696J 0 0 2 0 0 1 0 0 0 2 3 2
# 9 FNFQC9244G 2 1 0 2 1 0 2 1 2 1 1 2
# 10 WQTRB4780S 0 0 0 2 1 2 2 0 2 2 4 2
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句