我有一个按时间顺序排列的索引的四列矩阵,以及三列名称(字符串)。这是一些玩具数据:
x = rbind(c(1,"sam","harry","joe"), c(2,"joe","sam","jack"),c(3,"jack","joe","jill"),c(4,"harry","jill","joe"))
我想创建三个额外的向量,以计算(对于每行)该名称的先前(但不是后续)出现的次数。这是玩具数据的理想结果:
y = rbind(c(0,0,0),c(1,1,0),c(1,2,0),c(1,1,3))
我不知道如何解决该问题,并且已经在Stack Overflow中搜索了相关示例。dplyr提供了查找总数的答案,但是(据我所知)不是逐行的。
我试图编写一个函数在单列空间中处理此问题,但是没有运气,即
thing = sapply(x,function(i)length(grep(i,x[x[1:i]])))
任何提示将不胜感激。
这是典型的ave
+seq_along
类型问题,但我们需要先将数据转换为向量:
t(`dim<-`(ave(rep(1, prod(dim(x[, -1]))),
c(t(x[, -1])), FUN = seq_along) - 1,
rev(dim(x[, -1]))))
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 1 1 0
# [3,] 1 2 0
# [4,] 1 1 3
也许更具可读性:
## x without the first column as a vector
x_vec <- c(t(x[, -1]))
## The values that you are looking to obtain...
y_vals <- ave(rep(1, length(x_vec)), x_vec, FUN = seq_along) - 1
## ... in the format you want to obtain them
matrix(y_vals, ncol = ncol(x) - 1, byrow = TRUE)
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 1 1 0
# [3,] 1 2 0
# [4,] 1 1 3
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句