使用正则表达式将字符串分解为具有键和值的字典

debugcn 发表于 Dev

斯坦尼斯洛斯

我正在尝试将此列表变成完整的词典列表。下面，我提供了我正在使用的完整列表的摘要。

以第4.1行为例，我想要：

键为行号（'4.1'）
包含标题的值（“公司占用的属性（减去$ 43,332,898 \ nencumbrances”））
以及后面的四个数字作为列表['68，122,291'，'0'，'68,122,291'，'64,237,046']。

我了解了如何将每个单独的字典放在一起的一般循环。我正在努力的地方是想出正则表达式模式来获取行标题和行值。这很困难，因为某些行标题还包含数字。另一个问题是，并非所有行的末尾都有四个数字。对于这些情况，我只想要可用的数字。任何帮助找出正则表达式来抓住这些帮助。

    clean = ['4.1 Properties occupied by  the company (less $  43,332,898 \nencumbrances)  68,122,291  0  68,122,291  64,237,046 \n',
         '4.2 Properties held for  the production of income (less \n $    encumbrances)  0  0   0  0 \n',
         '4.3 Properties held for sale (less $  \nencumbrances)      0  0 \n',]
    
    clean_list = []
    
    for n in clean:
        row_num = re.findall(r'\d+\.',n)
        row_title = 
        row_values = 
        new_dict = {row_num: row_title, row_values}
        clean_list.append(new_dict)

亭子

不确定为什么要为每行单独使用一个字典，每个字典只有一个键。我认为以一本带有多个键的字典结尾会更有用。

d = {}
for line in clean:
    parts = re.match(r"^([\d.]+)\s+(.*?)\s+(\d[\d,.]*)\s*(?:(\d[\d,.]*)\s*)?(?:(\d[\d,.]*)\s*)?(?:(\d[\d,.]*)\s*)?$",
            line, re.DOTALL)
    code, title, *values = parts.group(1,2,3,4,5,6)
    d[code] = (title, list(filter(None, values)))

对于样本数据，的值为d：

{
  '4.1': (
    'Properties occupied by  the company (less $  43,332,898 \nencumbrances)', 
    ['68,122,291', '0', '68,122,291', '64,237,046']
  ), 
  '4.2': (
    'Properties held for  the production of income (less \n $    encumbrances)',
    ['0', '0', '0', '0']
  ), 
  '4.3': (
    'Properties held for sale (less $  \nencumbrances)', 
    ['0', '0']
  )
}

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。