如何在Python列表中将带单引号的字符串元素转换为双引号

debugcn 发表于 Dev

NewCoada

我正在为NLP任务预处理数据，需要以以下方式构造数据：

[tokenized_sentence]标签[tags_corresponding_to_tokens]

我有一个文本文件，该文件包含数千行这种格式的文件，其中两个列表之间用制表符分隔。这是一个例子

['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.']    ['I-ORG', 'O', 'I-MISC', 'O', 'O', 'O', 'I-MISC', 'O', 'O']

我用来获取代码的那段代码是

with open('data.txt', 'w') as foo:
    for i,j in zip(range(len(text)),range(len(tags))):
        foo.write(str([item for item in text[i].split()]) + '\t' + str([tag for tag in tags[j]]) + '\n')

其中，文本是包含句子的列表（即，每个句子是一个字符串），标签是标签的列表（即，与句子中每个单词/令牌相对应的标签是列表）。

我需要在保持这种结构的同时，使列表中的字符串元素具有双引号而不是单引号。预期的输出应如下所示

["EU", "rejects", "German", "call", "to", "boycott", "British", "lamb", "."]    ["I-ORG",  "O", "I-MISC", "O", "O", "O", "I-MISC", "O", "O"]

我尝试使用Python模块中的json.dump()和json.dumps()从json模块中获取，但没有获得所需的预期输出。相反，我将两个列表作为字符串。我最大的努力是手动添加这样的双引号（用于标签）

for i in range(len(tags)):
    for token in tags[i]:
        tkn = "\"%s\"" %token
        print(tkn)

这给出了输出

"I-ORG"
"O"
"I-MISC"
"O"
"O"
"O"
"I-MISC"
"O"
"O"
"I-PER"
"I-PER"
.
.
.

但是，这似乎效率太低。我看过这些相关的问题

但是他们没有直接解决这个问题。

我正在使用Python 3.8

AAAlex123

我很确定没有办法强迫python用双引号写字符串。默认为单引号。作为@deadshot评论，您可以替换'与"您写整个字符串的文件后，或当你写的每一个字手动添加双引号。这篇文章的答案有很多不同的处理方法，最简单的方法是f'"{your_string_here}"'。但是，您将需要分别编写每个字符串，因为编写列表会自动'在每个项目周围添加，这将是非常意大利面的。

find and replace ' with "将字符串写入文件后，只需执行此操作即可。

您甚至可以使用python做到这一点：

# after the string is written in 'data.txt'
with open('data.txt', "r") as f:
    text = f.read()

text = text.replace("'", '"')

with open('data.txt', "w") as f:
    text = f.write(text)

根据以下OP的评论进行编辑

代替上面的操作；这应该可以解决大多数问题，因为它会搜索', '仅希望出现在一个字符串末尾和下一个字符串开头的字符串

with open('data.txt', "r") as f:
    text = f.read()

# replace ' at the start of the list
text = text.replace("['", '["')

# replace ' at the end of the list
text = text.replace("']", '"]')

# replace ' at the item changes inside the list
text = text.replace("', '", '", "')

with open('data.txt', "w") as f:
    text = f.write(text)

（由OP编辑）根据我的最新评论进行的新编辑

运行此解决了我在注释中描述的问题，并返回了预期的解决方案。

with open('data.txt', "r") as f:
    text = f.read()

# replace ' at the start of the list
text = text.replace("['", '["')

# replace ' at the end of the list
text = text.replace("']", '"]')

# replace ' at the item changes inside the list
text = text.replace("', '", '", "')

text = text.replace("', ", '", ')

text = text.replace(", '", ', "')

with open('data.txt', "w") as f:
    text = f.write(text)

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-5

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章