I am aware of fetching values from xml of below format :
<note>
<col1>Tove</col1>
<col2>J</col2>
<test2>
<a> a </a>
<b> b </b>
<c> c </c>
<d> d </d>
</test2>
<code
a="1"
b="2"
c="3"
/>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
i have extracted the value as below :
for a in xmls.getiterator():
b = a.find("col1") # or col2
if b is not None:
print b.text #this helps in extracting the value
break
My problem is that i need to extract value in test2
and code
node, but using above method, i am getting output as None
Expected output
ideally as below but getting direct node values like a,b,c,d,1,2,3
would be best
<a> a </a>
<b> b </b>
<c> c </c>
<d> d </d>
and
a="1"
b="2"
c="3"
What is native way to extract values in different type of values from xml if we have target node name?
Related :
I would use lxml.etree
, .xpath()
and .attrib
to get the attribute values:
import lxml.etree as ET
data = """<note>
<col1>Tove</col1>
<col2>J</col2>
<test2>
<a> a </a>
<b> b </b>
<c> c </c>
<d> d </d>
</test2>
<code
a="1"
b="2"
c="3"
/>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
"""
tree = ET.fromstring(data)
for note in tree.xpath("//note"):
test2_values = [value.strip() for value in note.xpath(".//test2/*/text()")]
code_attrs = note.find("code").attrib
print(test2_values)
print(code_attrs)
Here, we are basically iterating over all note
nodes (assuming there are multiple), getting the texts of all nodes under the inner test2
node and all attributes that a code
node has.
Prints:
['a', 'b', 'c', 'd']
{'b': '2', 'c': '3', 'a': '1'}
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments