我有一个数据框架bp_example
,如下所示:
structure(list(Sequence = c("Sequence", "Sequence", "Sequence",
"Sequence", "Sequence", "Sequence", "Sequence", "Sequence", "Sequence",
"Sequence", "Sequence", "Sequence", "Sequence", "Sequence", "Sequence",
"Sequence", "Sequence", "Sequence", "Sequence", "Sequence", "Sequence",
"Sequence", "Sequence", "Sequence", "Sequence"), start = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25), end = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25),
score = c(-0.205, -0.229, -0.115, -0.427, -0.327, -0.543,
-0.717, -0.923, -1.241, -1.471, -1.737, -1.717, -1.247, -1.137,
-0.689, -0.731, -0.337, 0.091, 0.579, 0.93, 0.575, 0.128,
-0.036, -0.186, -0.259), residue = c("M", "D", "A", "R",
"M", "R", "E", "L", "S", "F", "K", "V", "V", "L", "L", "G",
"E", "G", "R", "V", "G", "K", "T", "S", "L"), epitope = c(".",
".", ".", ".", ".", ".", ".", ".", ".", ".", ".", ".", ".",
".", ".", ".", ".", ".", "E", "E", "E", ".", ".", ".", "."
)), .Names = c("Sequence", "start", "end", "score", "residue",
"epitope"), class = c("data.table", "data.frame"), row.names = c(NA,
-25L))
我不确定是否可以做我想做的事,但是无论如何,这是可以的。我要遍历该列,bp_example$epitope
并且如果一行中有14个以上的“ E”,即该列中出现“ E”的15个或更多连续行bp_example$epitope
,我想在其上添加相应的字符前一列(bp_example$residue
)打印为单个字符串(因数)。
考虑到我给出的示例,我想MDARMRELSFKVVLLG
打印字符串(最好将其存储为alist
或的元素data.frame
)。
我试了while
循环,但根本没有成功。
这是使用的选项data.table
。将'data.frame'转换为'data.table'(setDT(df1)
),创建一个run-lengh-id(rleid
)列('grp',基于'epitome'中出现“ E”值。按'Sequence'和' 'grp',我们在i
(epitome == "E"
)中指定逻辑条件,并且()if
的行数.N
大于14,然后paste
将'residue'元素加在一起
library(data.table)
setDT(df1)[, grp := rleid(epitope=="E")][epitope == "E",
.(residueConcat = if(.N > 14) paste(trimws(residue), collapse="")), .(Sequence, grp)]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句