ws = {}
nlp = spacy.load('de_core_news_sm')
data = 'Some long text'
train_corpus = nlp(data)
train_corpus = [token.text for token in train_corpus if not token.is_stop and len(token) > 4]
test_corpus = nlp('Some short sentence')
ae = train_corpus.similarity(test_corpus)
我到AttributeError: 'list' object has no attribute 'similarity'
了ae = train_corpus.similarity(test_corpus)
。如果我删除train_corpus = [token.text for token in train_corpus if not token.is_stop and len(token) > 4]
,它可以工作,但带有停用词。
如何删除停用词以使其仍然有效?
编辑:ae = nlp(train_corpus).similarity(test_corpus)
导致TypeError: Argument 'string' has incorrect type (expected str, got list)
.
请注意,您对英语短语使用德语模型。在您的情况下,您需要粘回剩余的令牌并再次创建一个“spacy 对象”。在您的情况下,您无论如何都会通过此条件 len(token) > 4 删除所有令牌..
import spacy
nlp = spacy.load('en_core_web_sm')
# nlp = spacy.load('de_core_news_sm')
ws = {}
#data = 'Some long text'
data = 'Some long text Elephant'
train_corpus = nlp(data)
train_corpus = nlp(" ".join([token.text for token in train_corpus if not token.is_stop and len(token) > 4]))
test_corpus = nlp('Some short sentence')
ae = train_corpus.similarity(test_corpus)
print(ae)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句