如何使用换行符读取字符串并将其存储到Pandas数据框或python列表中

debugcn 发表于 Dev

自卫队

我读了一个具有自定义数据格式的大型文本文件，如下所示：

file_object = open(file, "r")
contents = file_object.read()

打印内容将给出以下信息（整个“对象”只是一个带有新行的字符串）：

object name {
    # Data Type 1
    burgers [taste="good" type="food"];
    sushi [taste="good" type="food"];

    # Data Type 2
    NYC [population="300" type="urban"];
    
    # Data Type 3
    NYC -> CHI [distance="15.0"];

    LA -> SF [distnace="2.0"];
}

数据分为3个部分，以＃表示。数据在节内/节之间的换行可能不一致，因此我想先删除所有空的换行符，然后再想知道如何删除每一行中数据之前的制表符/空白。

object name {
# Data Type 1
burgers [taste="good" type="food"];
sushi [taste="good" type="food"];
# Data Type 2
NYC [population="300" type="urban"];
# Data Type 3
NYC -> CHI [distance="15.0"];
LA -> SF [distnace="2.0"];
}

然后从那里找出如何将其分为3个相应的部分。我不确定哪种数据结构是最好的，因为格式在整个过程中都是不同的（或者是否有一种更简单的方法可以读取该内容）。任何建议，将不胜感激！

丹妮丝

这是代码：

contents = """
object name {
    # Data Type 1
    burgers [taste="good" type="food"];
    sushi [taste="good" type="food"];

    # Data Type 2
    NYC [population="300" type="urban"];

    # Data Type 3
    NYC -> CHI [distance="15.0"];

    LA -> SF [distnace="2.0"];
}
"""

all_lines = contents.split("\n")

selected_lines = [line.strip() for line in all_lines if len(line) > 0]

new_contents = "\n".join(selected_lines)

print(new_contents)

结果在中new_contents。

编辑（回答评论）：

此时，您可以将字符串分成几个部分：

lines = new_contents.split("\n")

# remove first and last lines
lines = lines[1:-1]

sections = {}
for line in lines:
  if "#" in line:
    # create new key (Data type X)
    key = line[2:]
    # value of new key is an empty list
    sections[key] = []
  else:
    # append row to key (Data type X)
    sections[key].append(line)

print(sections)

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。