我继承了一些需要在Python中处理的xml。我正在使用xml.etree.cElementTree
,并且在将一个空元素之后出现的文本与该空元素的标签相关联时遇到了一些麻烦。xml比我在下面粘贴的要复杂得多,但是我已经简化了它以使问题更清楚(我希望!)。
我想要的结果是这样的字典:
期望的结果
{(9, 1): 'As they say, A student has usually three maladies:', (9, 2): 'poverty, itch, and pride.'}
元组也可以包含字符串(例如('9', '1')
)。我真的不在乎这个早期阶段。
这是XML:
test1.xml
<div1 type="chapter" num="9">
<p>
<section num="1"/> <!-- The empty element -->
As they say, A student has usually three maladies: <!-- Here lies the trouble -->
<section num="2"/> <!-- Another empty element -->
poverty, itch, and pride.
</p>
</div1>
我尝试过的
尝试1
>>> import xml.etree.cElementTree as ET
>>> tree = ET.parse('test1.xml')
>>> root = tree.getroot()
>>> chapter = root.attrib['num']
>>> d = dict()
>>> for p in root:
for section in p:
d[(int(chapter), int(section.attrib['num']))] = section.text
>>> d
{(9, 2): None, (9, 1): None} # This of course makes sense, since the elements are empty
尝试2
>>> for p in root:
for section, text in zip(p, p.itertext()): # unfortunately, p and p.itertext() are two different lengths, which also makes sense
d[(int(chapter), int(section.attrib['num']))] = text.strip()
>>> d
{(9, 2): 'As they say, A student has usually three maladies:', (9, 1): ''}
正如你可以在后面的尝试看,p
并且p.itertext()
是两个不同的长度。的值(9, 2)
是我要与key关联(9, 1)
的值,而我要与之关联的值(9, 2)
甚至都没有出现d
(因为zip
截断时间越长p.itertext()
)。
任何帮助,将不胜感激。提前致谢。
您是否尝试过使用.tail
?
import xml.etree.cElementTree as ET
txt = """<div1 type="chapter" num="9">
<p>
<section num="1"/> <!-- The empty element -->
As they say, A student has usually three maladies: <!-- Here lies the trouble -->
<section num="2"/> <!-- Another empty element -->
poverty, itch, and pride.
</p>
</div1>"""
root = ET.fromstring(txt)
for p in root:
for s in p:
print s.attrib['num'], s.tail
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句