I have the following xml:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://news.mycoolsite.com/city/newyork/cat-bites-dog/articleshow/12345.pms</loc>
<news:news>
<news:publication>
<news:name>New York Post</news:name>
<news:language>en</news:language>
</news:publication>
<news:publication_date>2017-12-27T07:23:12+03:30</news:publication_date>
<news:title>Cat bites dog</news:title>
<news:keywords>Cat biting,Dog,Fluffy,Pongo,Broadway,Cat attack,</news:keywords>
</news:news>
<lastmod>2017-12-27T10:17:04+03:30</lastmod>
<image:image>
<image:loc>https://news.mycoolsite.com/city/newyork/cat-bites-dog/photo/12345.pms</image:loc>
</image:image>
</url>
</urlset>
How do I get the 'loc' tag using ElementTree ?
I have tried the following:
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
for child in root:
print(child.tag, child.attrib)
tags = []
for loc in child.iter('loc'):
print loc.tag
tags.append(list)
print("Found so many tags:" + str(len(tags)))
But the problem is it doesn't seem to find any tags! What is the problem ? Does it have anything to do with the namespaces used ?
EDIT: If I delete the names spaces, then I seem to find both loc tags. So the problem seems to be I am not specifying the namespaces correctly. But the first loc tag doesn't have a namespace. So how do I specify the namespace correctly?
You have to specify the default namespace:
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
for child in root:
print(child.tag, child.attrib)
tags = []
for loc in child.iter('{http://www.sitemaps.org/schemas/sitemap/0.9}loc'):
print loc.tag
tags.append(list)
print("Found so many tags:" + str(len(tags)))
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments