我有一本书的文本文件,我希望将其读入python程序中,以使用将该文件拆分为句子open("book.txt").read().split(".")
。
问题在于文件具有新的换行符和多个空格。我希望文件只是由空格分隔的单词,并且所有新行都变成一个空格。
我book.txt
的当前是这样的(摘要):
To Sherlock Holmes she is always the woman. I have seldom
heard him mention her under any other name. In his eyes she
eclipses and predominates the whole of her sex. It was not that
he felt any emotion akin to love for Irene Adler. All emotions,
and that one particularly, were abhorrent to his cold, precise but
admirably balanced mind. He was, I take it, the most perfect
reasoning and observing machine that the world has seen, but as
a lover he would have placed himself in a false position. He
never spoke of the softer passions, save with a gibe and a sneer.
听起来您只是想删除所有换行符和尾随空格...
也许像...
import re
sentences = [re.sub("^\s*|\s*$,"",re.sub("\n","",each)) for each in open("book.txt").read().split(".")]
还是选项卡也有问题...
sentences = [re.sub("^\s*|\s*$","",re.sub("\s+"," ",each)) for each in open("book.txt").read().split(".")]
也可以除以?,!或。使用...
sentences = [re.sub("^\s*|\s*$","",re.sub("\s+"," ",each)) for each in re.split("[\?\.!]",open("book.txt").read())]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句