如何创建一种更有效的方法来解析两个大文本文件之间的单词（Python 3.6.4）

debugcn 发表于 Dev

火星

我是 Python 的新手，这是我第一次尝试应用我所学的知识，但我知道我效率低下。该代码有效，但需要几分钟才能完成对一个新颖大小的文本文件的执行。

有没有更有效的方法来达到相同的输出？任何造型批评也将不胜感激。谢谢！

def realWords(inFile, dictionary, outFile):
    with open(inFile, 'r') as inf, open(dictionary, 'r') as dictionary, open(outFile, 'w') as outf:
    realWords = ''
    dList = []
    for line in dictionary:
        dSplit = line.split()
        for word in dSplit:
            dList.append(word)
    for line in inf:
        wordSplit = line.split()
        for word in wordSplit:
            if word in dList:
                realWords += word + ' '
    outf.write(realWords)
    print('File of real words created')
    inf.close()
    dictionary.close()
    outf.close()

'''
I created a function to compare the words in a text file to real words taken 
from a reference dictionary (like the Webster Unabridged Dictionary). It 
takes a text file and breaks it up into individual word components. It then 
compares each word to each word in the reference dictionary text file in 
order to test whether the world is a real word or not. This is done so as to 
eliminate non-real words, names, and some other junk. For each word that 
passes the test, each word is then added to the same empty string. Once all 
words have been parsed, the output string containing all real words is 
written to a new text file.
'''

普里姆萨

对于小说中的每个单词，您都可以在整个词典中搜索一次，看看是否能找到该单词。这真的很慢。

您可以从使用 set() 数据结构中受益，它可以让您在恒定时间内快速确定元素是否在其中。

此外，通过摆脱字符串连接并使用 .join() 代替，您可以进一步加快代码速度。

我对你的代码做了一些调整，所以它使用了 set() 和 .join()，这应该会大大加快速度

def realWords(inFile, dictionary, outFile):
    with open(inFile, 'r') as inf, open(dictionary, 'r') as dictionary, open(outFile, 'w') as outf:
    realWords = [] #note list for constant time appends
    dList = set()
    for line in dictionary:
        dSplit = line.split()
        for word in dSplit:
        dList.add(word)
    for line in inf:
        wordSplit = line.split()
        for word in wordSplit:
            if word in dList: #done in constant time because dList is a set
                realWords.append(word)
    outf.write(' '.join(realWords))
    print('File of real words created')
    inf.close()
    dictionary.close()
    outf.close()

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。