合并行,以便合并值并忽略NA

塞巴斯蒂安·泽基(Sebastian Zeki)

我有一个数据框,如下所示:

TIMEdbMerge CopyNumber  Study     Sample       HRE
TC015II         NA     TC015        II        neg       
TC015III        0      NA           NA        NA    
TC015III        NA     TC015        III       neg   
TC015Quadrantic NA     TC015    Quadrantic    24    
TC016I          NA     TC016         I        NA    
TC016II         1      NA           NA        NA      
TC016II         NA     TC016        II        neg   
TC016Quadrantic NA     TC016    Quadrantic    6 
TC017I          NA     TC017        I         NA    
TC017II          3      NA          NA        NA
TC017II         NA     TC017        II         +

正是由于复杂的合并,我没有时间弄清楚。解决方法是,我只想合并重复的行,以使行中的实际值替换一对重复项的NA,因此结果应类似于:

TIMEdbMerge CopyNumber  Study     Sample       HRE
TC015II         NA     TC015        II        neg           
TC015III        0      TC015        III       neg   
TC015           NA     TC015         Q        24    
TC016I          NA     TC016         I        NA    
TC016II         1      TC016        II        neg   
TC016Quadrantic NA     TC016    Quadrantic    6 
TC017I          NA     TC017        I         NA    
TC017II         3      TC017        II         +

我知道如何删除重复的行,但是我不知道如何告诉r合并重复的行,但是仅当重复的任一行中的值都不为NA时才使用该值。我应该使用骨料吗?

阿克伦

我们可以na.locf用来填充每个组('TIMEdbMerge')中'CopyNumber'的非NA元素的NA元素ave然后删除具有NA“研究”,“样本”,“ HRE”列的所有元素的行

library(zoo)
df1$CopyNumber <- with(df1, ave(CopyNumber, TIMEdbMerge,
     FUN=function(x) na.locf(x, na.rm=FALSE)))
df1[rowSums(is.na(df1[3:5]))!=3,]
#       TIMEdbMerge CopyNumber Study     Sample  HRE
#1          TC015II         NA TC015         II  neg
#3         TC015III          0 TC015        III  neg
#4  TC015Quadrantic         NA TC015 Quadrantic   24
#5           TC016I         NA TC016          I <NA>
#7          TC016II          1 TC016         II  neg
#8  TC016Quadrantic         NA TC016 Quadrantic    6
#9           TC017I         NA TC017          I <NA>
#11         TC017II          3 TC017         II    +

或使用原始数据集left_join(或merge从中获取base R原始数据集)与仅包含“ CopyNumber”的非NA行的数据集的子集,然后如上所述,filter取出属于3列的NA的行。

library(dplyr)
 left_join(df1, filter(df1, !is.na(CopyNumber)) %>%
                       select(1:2), 
                 by='TIMEdbMerge') %>% 
                 select(-2) %>% 
                 filter(rowSums(is.na(.[2:4]))!=3)

数据

df1 <- structure(list(TIMEdbMerge = c("TC015II", "TC015III", 
"TC015III", 
"TC015Quadrantic", "TC016I", "TC016II", "TC016II", "TC016Quadrantic", 
"TC017I", "TC017II", "TC017II"), CopyNumber = c(NA, 0L, NA, NA, 
NA, 1L, NA, NA, NA, 3L, NA), Study = c("TC015", NA, "TC015", 
"TC015", "TC016", NA, "TC016", "TC016", "TC017", NA, "TC017"), 
Sample = c("II", NA, "III", "Quadrantic", "I", NA, "II", 
"Quadrantic", "I", NA, "II"), HRE = c("neg", NA, "neg", "24", 
NA, NA, "neg", "6", NA, NA, "+")), .Names = c("TIMEdbMerge", 
"CopyNumber", "Study", "Sample", "HRE"), class = "data.frame", 
row.names = c(NA, -11L))

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章