如何将某些单词更改为训练列表的标记形式

debugcn 发表于 Dev

编码

我试图将某些字符串更改为train.

train = c('love/POS','happy/POS','sad/NEG','fearsome/NEG','lazy/NEG')
test = c('I love you', 'I am so happy now', 'You look sad somehow', 'the lazy boy look so fearsome')

和他们在一起，我想做出这样的结果

[1]'I love/POS you' 'I am so happy/POS now' 'You look sad/NEG somehow' 'the lazy/NEG boy look so fearsome/NEG'

当然，我可以gsub像这样使用原始方式

part1 = gsub('love', 'love/POS', test)
part2 = gsub('happy', 'happy/POS', part1)
.......

但是，当我有更大的培训列表时，这种方式根本没有成效。

为了以更有效的方式使其成为可能，我尝试了

process1 = unlist(strsplit(test, '[[:space:]]+'))

mgsub <- function(pattern, replacement, x, ...) {
  if (length(pattern)!=length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result <- x
  for (i in 1:length(pattern)) {
    result <- gsub(pattern[i], replacement[i], result, ...)
  }
  result
}

trainedtest = mgsub(process1, train, test)
trainedtest

事实上，它根本不起作用，因为process1和train列表的长度不一样。从技术上讲，我应该制作一个程序，该程序可以选择某些单词来更改为火车列表的标记形式，并计算process1和之间的相似度train。

有没有办法让它成为可能？

索托斯

这是使用matchwith的基本 R 解决方案nomatch = 0（即不匹配则不返回任何内容 - 默认为 NA）

v1 <- sub('/.*', '', train)
sapply(strsplit(test, ' '), function(i)
       {i[grepl(paste(v1, collapse = '|'), i)] <- train[match(i, v1, nomatch = 0)]; 
                                                              paste(i, collapse = ' ')})

#[1] "I love/POS you"    "I am so happy/POS now"  "You look sad/NEG somehow"             
#[4] "the lazy/NEG boy look so fearsome/NEG"

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。