计算句子中的特定单词

debugcn 发表于 Dev

华流

我目前正在尝试解决这个家庭作业问题。

我的任务是实现一个函数，该函数返回给定文本中的字数向量。我需要将文本拆分为单词，然后使用NLTK'stokeniser 来标记每个句子。

这是我到目前为止的代码：

import nltk
import collections
nltk.download('punkt')
nltk.download('gutenberg')
nltk.download('brown')

def word_counts(text, words):
"""Return a vector that represents the counts of specific words in the text
>>> word_counts("Here is sentence one. Here is sentence two.", ['Here', 'two', 'three'])
[2, 1, 0]
>>> emma = nltk.corpus.gutenberg.raw('austen-emma.txt')
>>> word_counts(emma, ['the', 'a'])
[4842, 3001]
"""

from nltk.tokenize import TweetTokenizer
text = nltk.sent_tokenize(text)
words = nltk.sent_tokenize(words)

wordList = []

for sen in text, words:
    for word in nltk.word_tokenize(sen):

        wordList.append(text, words).split(word)

counter = TweetTokenizer(wordList)
return counter

有两个 doctest 应该给出以下结果：[2, 1, 0] 和 [4842, 3001]

这是我从我的代码中得到的错误信息

我花了一整天的时间试图解决这个问题，我觉得我已经接近了，但我不知道我做错了什么，脚本每次都给我一个错误。

任何帮助将不胜感激。谢谢你。

帕特里克·阿特纳

import nltk
import collections
from nltk.tokenize import TweetTokenizer
# nltk.download('punkt')
# nltk.download('gutenberg')
# nltk.download('brown')

def word_counts(text, words):
    """Return a vector that represents the counts of specific words in the text
    word_counts("Here is sentence one. Here is sentence two.", ['Here', 'two', 'three'])
    [2, 1, 0]
    emma = nltk.corpus.gutenberg.raw('austen-emma.txt')
    word_counts(emma, ['the', 'a'])
    [4842, 3001]
    """  

    textTok = nltk.word_tokenize(text) 
    counts =  nltk.FreqDist(textTok)   # this counts all word occurences

    return [counts[x] or 0 for x in words] # this returns what was counted for *words

r1 = word_counts("Here is sentence one. Here is sentence two.", ['Here', 'two', 'three'])
print(r1) #    [2, 1, 0]

emma = nltk.corpus.gutenberg.raw('austen-emma.txt')
r2 = word_counts(emma, ['the', 'a'])
print(r2) # [4842, 3001]

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。