我正在尝试将一个大文件(1.1GB)读入python。文件中将有单词“ HERE”。我不知道我会在哪一行找到这个词。我将文件读取成块。我的第一个数据块是字词“ HERE”。到目前为止,我的代码运行良好。(即在“ HERE”之前存储数据并对其进行处理)但是,由于“ HERE”之后的数据太大,我无法继续读取“ HERE”之后的数据。有什么办法可以让我逐行读取“ HERE”之后的数据吗?我提到了参考资料:读取文件直到python中的特定字符我的代码是:
def each_chunk(stream, separator):
buffer = ''
while True: # until EOF
chunk = stream.read() # I propose 4096 or so
if not chunk: # EOF?
yield buffer
break
buffer += chunk
while True: # until no separator is found
try:
part, buffer = buffer.split(separator, 1)
except ValueError:
break
else:
yield part
def first_chunk(chunk):
.... #my function
def chunk_after(data_line_by_line):
.... #my function
global This_1st_chunk
This_1st_chunk=True
myFile= open(r"C:\Users\Mavis\myFile.txt","r")
for chunk in each_chunk(myFile, separator='HERE'):
if This_1st_chunk:
first_chunk(chunk)
This_1st_chunk=False
elif not This_1st_chunk:
print('*******after 1st chunk*********')
#**I WANT TO READ THE DATA LINE BY LINE HERE.**
chunk_after(data_line_by_line)
逐行读取文件至第一个块(由分隔"HERE"
),然后收集所有行,处理该块,然后继续逐行读取文件,可能更简单。
像这样:
with open(r"C:\Users\Mavis\myFile.txt","r") as myFile:
chunk = []
first_chunk_found = False
while not first_chunk_found:
line = myFile.readline()
if "HERE" in line:
first_chunk_found = True
line, remainder = line.split("HERE")
line += "HERE" # current line up to "HERE"
chunk.append(line)
chunk = ''.join(chunk)
# do whatever you want with the first chunk here.
# also, the variable remainder has the rest of the line
# that contained the word "HERE", in case you want it
for line in myFile:
# now we process the rest of the file line by line
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句