我正在尝试使用某种类型的循环和正则表达式从我从网站检索的文本中创建字典。我希望字典看起来像这样:
{36:30281, 36 2/3:30282, 37:30283, 37 1/3: 30283, 38:30284 etc..}
这是我从网站上检索的文本:
[option value="-1">Choose size</option>, option value="30281">\r\n\t\t\t\t\t\t\t\t\t36\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t/option>, option value="30282">\r\n\t\t\t\t\t\t\t\t\t36 2/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t/option, option value="30283"\r\n\t\t\t\t\t\t\t\t\t37 1/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t/option, option value="30284">\r\n\t\t\t\t\t\t\t\t\t38\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>, option value="30285">\r\n\t\t\t\t\t\t\t\t\t38 2/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>, option value="30286">\r\n\t\t\t\t\t\t\t\t\t39 1/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>, option value="30287">\r\n\t\t\t\t\t\t\t\t\t40\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>, option value="30288">\r\n\t\t\t\t\t\t\t\t\t40 2/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>, option value="30289">\r\n\t\t\t\t\t\t\t\t\t41 1/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>]
我不太擅长正则表达式。谁能给我一个可以帮助我做到这一点的解决方案?
谢谢
您可以使用(演示):
value=\"(\d+)\"\D*(\d+(?:\ [\d/]+)?)
Python
这将是(使用字典理解):
import re
junk_string = """
[option value="-1">Choose size</option>, option value="30281">\r\n\t\t\t\t\t\t\t\t\t36\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t/option>, option value="30282">\r\n\t\t\t\t\t\t\t\t\t36 2/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t/option, option value="30283"\r\n\t\t\t\t\t\t\t\t\t37 1/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t/option, option value="30284">\r\n\t\t\t\t\t\t\t\t\t38\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>, option value="30285">\r\n\t\t\t\t\t\t\t\t\t38 2/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>, option value="30286">\r\n\t\t\t\t\t\t\t\t\t39 1/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>, option value="30287">\r\n\t\t\t\t\t\t\t\t\t40\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>, option value="30288">\r\n\t\t\t\t\t\t\t\t\t40 2/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>, option value="30289">\r\n\t\t\t\t\t\t\t\t\t41 1/3\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</option>]
"""
rx = re.compile(r'value=\"(\d+)\"\D*(\d+(?:\ [\d/]+)?)')
result = {m.group(2): m.group(1)
for m in rx.finditer(junk_string)}
print(result)
# {'36': '30281', '36 2/3': '30282', '37 1/3': '30283', '38': '30284', '38 2/3': '30285', '39 1/3': '30286', '40': '30287', '40 2/3': '30288', '41 1/3': '30289'}
但正如评论中已经说过的,这实际上不是文本而是 a 的一部分DOM
,所以至少考虑使用解析器。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句