我想这样打开一个页面文档:
directory = "/Path/to/file/"
with open(directory+"test.pages") as file:
data = f.readlines()
for line in data:
words = line.split()
print words
然后我得到了这个错误:
IOError: [Errno 21] Is a directory: '/path/to/file/test.pages'
为什么这是目录?那我怎么打开呢?
'/path/to/file/test.pages'
是文件系统上的目录,因此无法在Python中打开。您的操作系统正在捆绑该目录中的多个文件,并且可能将其显示为单个软件包。您可以想像地遍历目录并获取内容:
for root, dirs, files in os.walk('/path/to/file/test.pages'):
for file in files:
print os.path.join(root, file)
但是打开文件并尝试读取其内容很可能是徒劳的。
我将向您展示如何尝试查找任何纯文本:
import re
# use a pattern that matches for any letter A-Z, upper and lower, 0-9, and _
pattern = re.compile(r'.*\w+.*')
for root, dirs, files in os.walk('/path/to/file/test.pages'):
for file in files:
# open each file with the context manager so it's automatically closed
# regardless if there's an error. Use the Universal Newlines (U) flag too
# as a best practice (Unix, Linux, and MS have different newlines).
with open(os.path.join(root, file), 'rU') as f:
for line in f:
if re.match(pattern, line):
print line
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句