Merge child nodes with the similar parent node, xml, python

mr.M

I have the following xml file:

<root>
    <article_date>09/09/2013
    <article_time>1
        <article_name>aaa1</article_name>
        <article_link>1aaaaaaa</article_link>
    </article_time>
    <article_time>0
        <article_name>aaa2</article_name>
        <article_link>2aaaaaaa</article_link>
    </article_time>
    <article_time>1
        <article_name>aaa3</article_name>
        <article_link>3aaaaaaa</article_link>
    </article_time>
    <article_time>0
        <article_name>aaa4</article_name>
        <article_link>4aaaaaaa</article_link>
    </article_time>
    <article_time>1
        <article_name>aaa5</article_name>
        <article_link>5aaaaaaa</article_link>
    </article_time>
    </article_date>
</root>

I would like to transform it to the following file:

<root>
    <article_date>09/09/2013
    <article_time>1
        <article_name>aaa1+aaa3+aaa5</article_name>
        <article_link>1aaaaaaa+3aaaaaaa+5aaaaaaa</article_link>
    </article_time>
    <article_time>0
        <article_name>aaa2+aaa4</article_name>
        <article_link>2aaaaaaa+4aaaaaaa</article_link>
    </article_time>
</root>

How can I do it in python?

My approach to do this task is the following: 1) loop through tags 2) form dictionary key- either 0 or 1, value - 3) for each element in this dictionary find all child nodes: and and append them

Since that, I wrote the following code to implement this (ps I am currently struggling with adding elements to the dictionary, but I will overcome this issue):

def parse():
list_of_inique_timestamps=[]
text_to_merge=""
tree=et.parse("~/Documents/test1.xml")
root=tree.getroot()
for children in root:
    print children.tag, children.text
    for child in children:
        print (child.tag,int(child.text))
        if not child.text in list_of_inique_timestamps:
            list_of_inique_timestamps.append(child.text)
print list_of_inique_timestamps
alecxe

Here's the solution using xml.etree.ElementTree from python standard library.

The idea is to gather items into defaultdict(list) per article_time text value:

from collections import defaultdict
import xml.etree.ElementTree as ET

data = """<root>
    <article_date>09/09/2013
    <article_time>1
        <article_name>aaa1</article_name>
        <article_link>1aaaaaaa</article_link>
    </article_time>
    <article_time>0
        <article_name>aaa2</article_name>
        <article_link>2aaaaaaa</article_link>
    </article_time>
    <article_time>1
        <article_name>aaa3</article_name>
        <article_link>3aaaaaaa</article_link>
    </article_time>
    <article_time>0
        <article_name>aaa4</article_name>
        <article_link>4aaaaaaa</article_link>
    </article_time>
    <article_time>1
        <article_name>aaa5</article_name>
        <article_link>5aaaaaaa</article_link>
    </article_time>
    </article_date>
</root>
"""

tree = ET.fromstring(data)

root = ET.Element('root')
article_date = ET.SubElement(root, 'article_date')
article_date.text = tree.find('.//article_date').text

data = defaultdict(list)
for article_time in tree.findall('.//article_time'):
    text = article_time.text.strip()
    name = article_time.find('./article_name').text
    link = article_time.find('./article_link').text
    data[text].append((name, link))

for time_value, items in data.iteritems():
    article_time = ET.SubElement(article_date, 'article_time')
    article_name = ET.SubElement(article_time, 'article_name')
    article_link = ET.SubElement(article_time, 'article_name')

    article_time.text = time_value
    article_name.text = '+'.join(name for (name, _) in items)
    article_link.text = '+'.join(link for (_, link) in items)

print ET.tostring(root)

prints (prettified):

<root>
    <article_date>09/09/2013
        <article_time>1
            <article_name>aaa1+aaa3+aaa5</article_name>
            <article_name>1aaaaaaa+3aaaaaaa+5aaaaaaa</article_name>
        </article_time>
        <article_time>0
            <article_name>aaa2+aaa4</article_name>
            <article_name>2aaaaaaa+4aaaaaaa</article_name>
        </article_time>
    </article_date>
</root>

See, the result is exactly what you were aiming to.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Adding a parent node to one or more child nodes of an XML using xslt

From Dev

SQL Server append XML child nodes to parent node

From Dev

Parse XML: Get child nodes for each parent node

From Dev

Parse XML: Get child nodes for each parent node

From Dev

Python: In an xml, How to delete nodes within a parent node

From Dev

Merge all values of a node's child nodes

From Dev

How to import only parent node leaving all the child nodes from xml file using powershell?

From Dev

group and merge xml nodes based on specific child nodes

From Dev

Transform XML nodes into attributes of the parent and merge duplicate nodes using XSLT

From Dev

For XML structure with two parent nodes and multiple child nodes?

From Dev

Parsing XML by OpenXML with multiple Parent nodes with multiple child nodes

From Dev

Collecting XML child nodes from multiple parent nodes

From Dev

Copy XML node with added changes to child nodes

From Dev

Getting parent information from child node in XML

From Dev

merge child directory into parent directory structure python

From Dev

JS Tree - Select parent node when all the child nodes are selected

From Dev

Hide unrelated parent nodes but child node in D3.js

From Dev

Firebase Security: read-only parent node, but writable child nodes

From Dev

Detect tap only on main node/parent, not child nodes

From Dev

How to get selected child nodes as well as the parent node in jquery?

From Dev

how to expand the parent nodes of a selected child node on kendo treeview

From Dev

Unable to delete child nodes of a parent node (Devexpress TreeList)

From Dev

PowerShell .NET Retrieve only parent node if all child nodes are selected

From Dev

Hide unrelated parent nodes but child node in D3.js

From Dev

XSLT sort parent nodes based on maximum child node

From Dev

Find dead nodes from Parent Child Node collection

From Dev

Insert multiple child nodes with same structure into the given parent node

From Dev

XSLT generate table out of child nodes in same parent node

From Dev

How to import XML with nested nodes (parent/child relationships) into Access?

Related Related

  1. 1

    Adding a parent node to one or more child nodes of an XML using xslt

  2. 2

    SQL Server append XML child nodes to parent node

  3. 3

    Parse XML: Get child nodes for each parent node

  4. 4

    Parse XML: Get child nodes for each parent node

  5. 5

    Python: In an xml, How to delete nodes within a parent node

  6. 6

    Merge all values of a node's child nodes

  7. 7

    How to import only parent node leaving all the child nodes from xml file using powershell?

  8. 8

    group and merge xml nodes based on specific child nodes

  9. 9

    Transform XML nodes into attributes of the parent and merge duplicate nodes using XSLT

  10. 10

    For XML structure with two parent nodes and multiple child nodes?

  11. 11

    Parsing XML by OpenXML with multiple Parent nodes with multiple child nodes

  12. 12

    Collecting XML child nodes from multiple parent nodes

  13. 13

    Copy XML node with added changes to child nodes

  14. 14

    Getting parent information from child node in XML

  15. 15

    merge child directory into parent directory structure python

  16. 16

    JS Tree - Select parent node when all the child nodes are selected

  17. 17

    Hide unrelated parent nodes but child node in D3.js

  18. 18

    Firebase Security: read-only parent node, but writable child nodes

  19. 19

    Detect tap only on main node/parent, not child nodes

  20. 20

    How to get selected child nodes as well as the parent node in jquery?

  21. 21

    how to expand the parent nodes of a selected child node on kendo treeview

  22. 22

    Unable to delete child nodes of a parent node (Devexpress TreeList)

  23. 23

    PowerShell .NET Retrieve only parent node if all child nodes are selected

  24. 24

    Hide unrelated parent nodes but child node in D3.js

  25. 25

    XSLT sort parent nodes based on maximum child node

  26. 26

    Find dead nodes from Parent Child Node collection

  27. 27

    Insert multiple child nodes with same structure into the given parent node

  28. 28

    XSLT generate table out of child nodes in same parent node

  29. 29

    How to import XML with nested nodes (parent/child relationships) into Access?

HotTag

Archive