使用正则表达式在数据框中查找一行

debugcn 发表于 Dev

Richard Tennen 博士

我有一个翻译表 ( trans_df)：

trans_df <- read.table(text = "rs1065852 rs201377835 rs28371706 rs5030655 rs5030865 rs3892097 rs35742686 rs5030656 rs5030867 rs28371725 rs59421388
                       G           C          G         A         C         C          T        CTT         T          C          C
                       G           C          G         A         C         C        del        CTT         T          C          C
                       A           C          G         A         C         T          T        CTT         T          C          C
                     del         del        del       del       del       del        del        del       del        del        del
                       G           C          G       del         C         C          T        CTT         T          C          C
                       G           C          G         A         C         C          T        CTT         G          C          C
                       G           C          G         A         C         C          T        del         T          C          C
                       A           C          G         A         C         C          T        CTT         T          C          C
                       G           C          A         A         C         C          T        CTT         T          C          C
                       G           C          G         A         C         C          T        CTT         T          C          T
                       G           C          G         A         C         C          T        CTT         T          T          C",header=TRUE, stringsAsFactors = FALSE, colClasses = "character")

和input：

    input <- read.table(text = "rs1065852 rs201377835 rs28371706 rs5030655 rs5030865 rs3892097 rs35742686 rs5030656 rs5030867 rs28371725 rs59421388
+ G|A           C        G|A         A         C       T|C          T  CTT         T        C|T          C", header = TRUE, stringsAsFactors = FALSE, colClasses = "character")

我想使用正则表达式在 trans_df 中找到输入行。我通过职位实现了它：

Reduce(intersect,lapply(seq(1, ncol(trans_df)), 
                          function(i) {grep(pattern = input[, i], 
                          trans_df[, i])}))

有什么方法可以在模式 = 输入的情况下做到这一点？请指教。

索托斯

你可以用它Map来实现，即

Map(grep, input, trans_df)

但是，这假设您的列是一对一匹配的。如果那不成立，那么您可以使用match使它们相同，即

Map(grep, input[match(names(input), names(trans_df))], trans_df)
#or in the same sense and to keep input intact,
Map(grep, input, trans_df[match(names(trans_df), names(input))])

但是，我认为这会超出您的目的。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。