direc = "emails/"
files = os.listdir(direc)
emails = [direc + email for email in files]
words = []
c = len (emails)
for email in emails:
f = open(email)
blob = f.read()
words += blob.split( )
print c
c-=1
for i in range(len(words)):
words [i] = words[i].lower()
dictionary = Counter(words)
print dictionary.most_common(5000)
如果电子邮件很短,则此代码可以正常工作,当电子邮件超过 10 个单词时,给出错误:“list index out of range” in line words [i] = words[i].lower()
这
for i in range(len(words)):
words [i] = words[i].lower()
可以改写为
words = map(str.lower, words)
按 index 遍历列表几乎总是一种代码味道for i in range(len(x))
,如果您需要索引,则应使用enumerate
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句