在python中将特定字符后的大文件分成较小的块？

debugcn 发表于 Dev

Mavis_Ackerman

我正在尝试将一个大文件（1.1GB）读入python。文件中将有单词“ HERE”。我不知道我会在哪一行找到这个词。我将文件读取成块。我的第一个数据块是字词“ HERE”。到目前为止，我的代码运行良好。（即在“ HERE”之前存储数据并对其进行处理）但是，由于“ HERE”之后的数据太大，我无法继续读取“ HERE”之后的数据。有什么办法可以让我逐行读取“ HERE”之后的数据吗？我提到了参考资料：读取文件直到python中的特定字符我的代码是：

def each_chunk(stream, separator):
  buffer = ''
  while True:  # until EOF
    chunk = stream.read()  # I propose 4096 or so
    if not chunk:  # EOF?
      yield buffer
      break
    buffer += chunk
    while True:  # until no separator is found
      try:
        part, buffer = buffer.split(separator, 1)
      except ValueError:
        break
      else:
        yield part

def first_chunk(chunk):
    .... #my function

def chunk_after(data_line_by_line):
    .... #my function

global This_1st_chunk
This_1st_chunk=True

myFile= open(r"C:\Users\Mavis\myFile.txt","r")
for chunk in each_chunk(myFile, separator='HERE'):
    if This_1st_chunk:
        first_chunk(chunk)
        This_1st_chunk=False
    elif not This_1st_chunk:
        print('*******after 1st chunk*********')
        #**I WANT TO READ THE DATA LINE BY LINE HERE.**
        chunk_after(data_line_by_line)

面孔

逐行读取文件至第一个块（由分隔"HERE"），然后收集所有行，处理该块，然后继续逐行读取文件，可能更简单。

像这样：

with open(r"C:\Users\Mavis\myFile.txt","r") as myFile:
    chunk = []
    first_chunk_found = False
    while not first_chunk_found:
        line = myFile.readline()
        if "HERE" in line:
            first_chunk_found = True
            line, remainder = line.split("HERE")
            line += "HERE"  # current line up to "HERE"
        chunk.append(line)
    chunk = ''.join(chunk)
    # do whatever you want with the first chunk here.
    # also, the variable remainder has the rest of the line
    # that contained the word "HERE", in case you want it
    for line in myFile:
        # now we process the rest of the file line by line

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-2

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

在python中将特定字符后的大文件分成较小的块？

在python中将特定字符后的大文件分成较小的块？

在Shell脚本中将大文件拆分为较小的块

从大文件中创建较小的块，并对块进行排序

将文本文件分成较小的块

在Lua中将字符串拆分成相等的块

在Lua中将字符串拆分成相等的块

Bash 将 2 行块的大文件拆分为较小的文件

将数据帧分成较小的块

将大文件分成小块

如何在R中将很大的OpenStreetMap文件分成较小的文件而又不会耗尽内存？

如何在Python中将列表分成大小不同的块？

当某些模式在python中发生时，将较大的列表分成较小的块

Python大文件，如何查找具有特定字符串的特定行

Python将文本分成x个字符的块

在文件.yml中将字符串分成两行

Python，读取特定的文件块

将数字流分成较小的块进行处理

熊猫-将巨大的数据框分成较小的块

在Python中将巨大的CSV分成三个随机文件

如何在C＃中将图像分成较小的部分？

如何在Pygame中将精灵分成较小的部分？

在PostgreSQL中将重叠的间隔拆分成较小的，感人的间隔

仅从Python中的大文件读取特定的行号？

Python：通过串联大小将字符串列表分成较小块的有效方法

如何以较小的更改备份一个大文件？

使用较小的匹配标头从大文件中检索文本

如何在 webpack 2 中将一个大的子文件拆分成一个单独的块

在python中将大文件拆分为小文件时出现内存错误

在特定长度后将字符串分成两个变量

如何在python的某些点将列表拆分成较小的列表？