从字符串中提取字典

debugcn 发表于 Dev

梅迪（Mehdi Khlifi）

我正在调用一个函数，该函数返回包含字典的字符串。在记住第一行和最后一行可以包含“ {”和“}”的情况下，如何提取此dict。

This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}
This is a {testing string} example
This {is} a testing {string} example

我需要将此值提取为dict变量。

{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}

S3DEV

更新的答案

接受@martineau和@ekhumoro的评论，以下经过编辑的代码包含一个函数，该函数搜索字符串并提取有效的all dict。这是我以前回答的一种更可靠的方法，因为现实世界的内容dict可能会有所不同，并且这种逻辑（希望如此）可以解决这一问题。

样例代码：

import json
import re

def extract_dict(s) -> list:
    """Extract all valid dicts from a string.
    
    Args:
        s (str): A string possibly containing dicts.
    
    Returns:
        A list containing all valid dicts.
    
    """
    results = []
    s_ = ' '.join(s.split('\n')).strip()
    exp = re.compile(r'(\{.*?\})')
    for i in exp.findall(s_):
        try:
            results.append(json.loads(i))        
        except json.JSONDecodeError:
            pass    
    return results

测试字符串：

OP的原始字符串已更新为添加多个dicts，一个数字值（作为最后一个字段）和一个list值。

s = """
This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": 5
}
{"website": "stackoverflow",
"type": "question",
"date": "2020-09-11"
}
{"website": "stackoverflow",
"type": "question",
"dates": ["2020-09-11", "2020-09-12"]
}
This is a {testing string} example
This {is} a testing {string} example
"""

输出：

如OP所述，dict字符串中通常只有一个，因此可以（显然）使用进行访问results[0]。

>>> results = extract_dict(s)

[{'website': 'stackoverflow', 'type': 'question', 'date': 5},
 {'website': 'stackoverflow', 'type': 'question', 'date': '2020-09-11'},
 {'website': 'stackoverflow', 'type': 'question', 'dates': ['2020-09-11', '2020-09-12']}]

原始答案：

忽略此部分。尽管该代码有效，但它特别适合OP的要求，并且不能用于其他用途。

此示例使用正则表达式来标识dict的开始{"和"}位置，然后提取中间的内容，然后将字符串转换为适当的dict。随着新行的出现和正则表达式的复杂化，我只是拉平了字符串的开头。

根据@jizhihaoSAMA的评论，我已更新为可json.loads用于将字符串转换为dict，因为它更干净。如果您不想进行其他导入，eval也可以使用，但不建议这样做。

样例代码：

import json
import re

s = """
This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}
This is a {testing string} example
This {is} a testing {string} example
"""

s_ = ' '.join(s.split('\n')).strip()
d = json.loads(re.findall(r'(\{\".*\"\s?\})', s_)[0])

>>> d
>>> d['website']

输出：

{"website": "stackoverflow", "type": "question", "date": "10-09-2020"}

'stackoverflow'

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-5

我来说两句

0条评论

登录后参与评论

来自分类Dev

如何从字符串中提取字典形式？

来自分类Dev