我有一个关于将一列中的库(我的原始数据有 2000 行)中的单词匹配到另一列(13 行)的问题。我正在处理 NA 值并填充不相等的行。但是,有匹配的单词,所以这些应该匹配,而不匹配的单词应该是 NA 值。
我有示例数据。
df <- data.frame(words<-c("How","did","Quebec","nationalists","see","their","province","as","a","nation","in","the","1960s"))
df2 <- data.frame(library<-c("How","see","as","a","for","then","than","example"),
embedding1<-c(.5,.6,.7,.8,.9,.3,.46,.48,.53,.42),
embedding2<-c(.1,.5,.4,.8,.9,.3,.98,.73,.48,.56))
在这里,我尝试将数据与 merge() 数据进行匹配和合并
df<-merge(df, df2, all=T, na.rm=T)
我没有将这些词与列匹配,是否有关于如何执行此操作的建议?
我希望我的数据看起来像这样...
df4<-data.frame(words<-c("How","did","Quebec","nationalists","see","their","province","as","a","nation","in","the","1960s"),
matched<-c("How",NA,NA,NA,"see",NA,NA,"as","a",NA,NA,"the",NA),
embedding1<-c(.7,NA,NA,NA,.8,NA,NA,.9,.3,NA,NA,.6,NA),
embedding2<-c(.1,NA,NA,NA,.8,NA,NA,.9,.3,NA,NA,.5,NA))
我认为你缺少by.x
和by.y
参数:
# merge data - full join
dd <- merge(x = df, y = df2, by.x = 'words', by.y = 'library', all.x = T)
# add column from df
dd$library <- as.character(df2$library[match(dd$words, df2$library)])
print(dd)
words embedding1 embedding2 library
1 How 0.50 0.10 How
2 see 0.60 0.50 see
3 as 0.70 0.40 as
4 a 0.80 0.80 a
5 1960s NA NA <NA>
6 did NA NA <NA>
7 Quebec NA NA <NA>
8 nationalists NA NA <NA>
9 their NA NA <NA>
10 province NA NA <NA>
11 the NA NA <NA>
12 nation NA NA <NA>
13 in NA NA <NA>
14 for 0.90 0.90 for
15 then 0.30 0.30 then
16 than 0.46 0.98 than
17 example 0.48 0.73 example
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句