我有一个包含多个数据框(即“ mylist”)和一个数据框(即“ mydf”)的列表。有了这两个,我需要解决两个要使用R解决的问题
实际列表包含许多数据框,而实际数据框包含10000行。这里仅显示示例数据
第一个问题:我有一个包含多个数据框的列表。以下列表是一个示例
mylist1 <- list(a = data.frame(ID = c("a_1", "b_1", "c_1", "d_1", "e_1", "f_1"), colb = c(3.67, 4.94, 8.11, 2.85, 9.53, 7.5), colc = c(3.45, 6.19, 4.96, 6.73, 9.26, 8.62)),
b = data.frame(cola = c("a_1", "b_1", "c_1", "d_1", "e_1", "f_1"), colb = c(5.24, 3.62, 0.29, 6.65, 7.86, 8.7), colc = c(7.03, 7.51, 0.842, 3.56, 8.68, 5.844)))
我想根据条件'colc'中的值对列表的每个数据帧中的行进行子集处理,如果列'colc'中的值> = 6,我希望对每个数据帧中的行进行子集化列表中的
mylist1的预期输出1如下...
mylistoutput <- list(a = data.frame(ID = c("b_1", "d_1", "e_1", "f_1"), colb = c(4.94, 2.85, 9.53, 7.5), colc = c(6.19, 6.73, 9.26, 8.62)),
b = data.frame(cola = c("a_1", "b_1", "e_1"), colb = c(5.24, 3.62, 7.86), colc = c(7.03, 7.51, 8.68)))
我尝试使用带有过滤器/子集的条件对行进行子集化,如下所示
mylistoutput <- lapply(mylist, function(x) filter(x$colc >= 6))
但是失败了...
第二个问题:从“ mylistoutput”,我想做两件事
首先,对于第一个数据框“ mylistoutput”,我想将“ mylistoutput”中“ ID”列的ID与“ mydf”数据框中的ID匹配
数据框“ mydf”示例如下
mydf <- data.frame(ID = c("a_1","a_1","a_1","a_1","a_1", "b_1","b_1","b_1","b_1", "c_1","c_1","c_1", "d_1","d_1","d_1", "e_1","e_1","e_1","e_1","e_1", "f_1","f_1","f_1","g_1","g_1","g_1","g_1","g_1"), colb = c(3.67,1,2.3,2.5,5, 1.1,2.2,3.7,4.94, 8.11,1.23,2, 2.85,1,2, 5,4,9.53,4,5, 8,7,7.5, 1,2,3,4,5), colc = c(3.45,1,2,3,4, 6.19,1,2,3, 4.96,1,2, 6.73,1,2, 9.26,1,2,3,4, 8.62,1,2, 1,2,3,4,5))
现在,我想提取“ mylistoutput”和“ mydf”中第一个数据帧之间的所有匹配ID
“ mydf”的预期输出如下
mydfoutput1 <- data.frame(ID = c("b_1","b_1","b_1","b_1", "d_1","d_1","d_1", "e_1","e_1","e_1","e_1","e_1", "f_1","f_1","f_1"), colb = c(1.1,2.2,3.7,4.94, 2.85,1,2, 5,4,9.53,4,5, 8,7,7.5), colc = c(6.19,1,2,3, 6.73,1,2, 9.26,1,2,3,4, 8.62,1,2))
其次,我想在“ mylistoutput”列表的各个数据帧中选择匹配的ID。例如,“ b_1”和“ e_1”是列表“ mylistoutput”的两个数据框中的公共ID。然后,我想从数据框“ mydf”中子集相同的ID,即“ b_1”和“ e_1”
预期输出如下
mydfoutput2 <- data.frame(ID = c("b_1","b_1","b_1","b_1", "e_1","e_1","e_1","e_1","e_1"), colb = c(1.1,2.2,3.7,4.94, 5,4,9.53,4,5), colc = c(6.19,1,2,3, 9.26,1,2,3,4, ))
寻找代码来解决上述问题
我们可以使用lapply
与subset
out <- lapply(mylist1, subset, subset = colc >=6)
对于第二种情况,我们可以
subset(mydf, ID %in% out[[1]]$ID)
对于第三种情况,请Reduce
与intersect
subset(mydf, ID %in% Reduce(intersect, lapply(out, `[[`, 1)))
filter
来自dplyr
,它需要一个data.frame作为输入,而不是一个向量
lapply(mylist, function(x) filter(x, colc >= 6))
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句