我正在尝试将此列表变成完整的词典列表。下面,我提供了我正在使用的完整列表的摘要。
以第4.1行为例,我想要:
我了解了如何将每个单独的字典放在一起的一般循环。我正在努力的地方是想出正则表达式模式来获取行标题和行值。这很困难,因为某些行标题还包含数字。另一个问题是,并非所有行的末尾都有四个数字。对于这些情况,我只想要可用的数字。任何帮助找出正则表达式来抓住这些帮助。
clean = ['4.1 Properties occupied by the company (less $ 43,332,898 \nencumbrances) 68,122,291 0 68,122,291 64,237,046 \n',
'4.2 Properties held for the production of income (less \n $ encumbrances) 0 0 0 0 \n',
'4.3 Properties held for sale (less $ \nencumbrances) 0 0 \n',]
clean_list = []
for n in clean:
row_num = re.findall(r'\d+\.',n)
row_title =
row_values =
new_dict = {row_num: row_title, row_values}
clean_list.append(new_dict)
不确定为什么要为每行单独使用一个字典,每个字典只有一个键。我认为以一本带有多个键的字典结尾会更有用。
d = {}
for line in clean:
parts = re.match(r"^([\d.]+)\s+(.*?)\s+(\d[\d,.]*)\s*(?:(\d[\d,.]*)\s*)?(?:(\d[\d,.]*)\s*)?(?:(\d[\d,.]*)\s*)?$",
line, re.DOTALL)
code, title, *values = parts.group(1,2,3,4,5,6)
d[code] = (title, list(filter(None, values)))
对于样本数据,的值为d
:
{
'4.1': (
'Properties occupied by the company (less $ 43,332,898 \nencumbrances)',
['68,122,291', '0', '68,122,291', '64,237,046']
),
'4.2': (
'Properties held for the production of income (less \n $ encumbrances)',
['0', '0', '0', '0']
),
'4.3': (
'Properties held for sale (less $ \nencumbrances)',
['0', '0']
)
}
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句