How do I remove duplicate notes from an XML document in Perl?

akram1rekik

I have a sitemap video file xml with duplicated nodes :

<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"> 
<url>
<loc>http://www.tubtun.com/video/Samsung_42Channel_Wireless_SoundStand</loc>
<video:video>
    <video:title>Samsung 42Channel Wireless SoundStand</video:title>
    <video:description>Samsung 4.2Channel Wireless SoundStand</video:description>
    <video:thumbnail_loc>http://www.tubtun.com/media/files_thumbnail/user91/pl_5364844b0dc.jpg</video:thumbnail_loc>
    <video:player_loc>http://www.tubtun.com/modules/vPlayer/vPlayer.swf?f=http://www.tubtun.com/modules/vPlayer/vPlayercfg.php?fid=844b0dc2c7258f4de11</video:player_loc>
    <video:publication_date>2015-01-27</video:publication_date>
</video:video>
</url>
<url>
<loc>http://www.tubtun.com/video/Samsung_42Channel_Wireless_SoundStand</loc>
<video:video>
    <video:title>Samsung 42Channel Wireless SoundStand</video:title>
    <video:description>Samsung 4.2Channel Wireless SoundStand</video:description>
    <video:thumbnail_loc>http://www.tubtun.com/media/files_thumbnail/user91/pl_5364844b0dc.jpg</video:thumbnail_loc>
    <video:player_loc>http://www.tubtun.com/modules/vPlayer/vPlayer.swf?f=http://www.tubtun.com/modules/vPlayer/vPlayercfg.php?fid=844b0dc2c7258f4de11</video:player_loc>
    <video:publication_date>2015-01-27</video:publication_date>
</video:video>
</url>
.....

I have written a perl script to remove this duplicated data:

use strict;
use warnings;
use XML::LibXML;

my $file = 'sitemap.xml';
my $doc = XML::LibXML->load_xml( location => $file );

my %seen;
foreach my $uni ( $doc->findnodes('//url') ) {  # 'university' nodes only

    my $name = $uni->find('video:title');

    print "'$name' duplicated\n",
      $uni->unbindNode() if $seen{$name}++;  # Remove if seen before
}

$doc->toFile('clarified.xml'); # Print to file

Unfortunately, the file "clarified.xml" is the same as sitemap.xml.

I don't know what is wrong with my script.

Pradeep

I have it working, here's the code & I tried the solution provided in https://stackoverflow.com/a/4817929/235961

use strict;
use warnings;
use XML::LibXML;

my $file = 'sitemap.xml';
my $doc = XML::LibXML->load_xml( location => $file );

my %seen;
foreach my $uni ( $doc->findnodes("//*[name() ='url']") ) {  # 'university' nodes only

    my $name = $uni->find('//video:title');
    print "'$name' duplicated\n",
      $uni->unbindNode() if $seen{$name}++;  # Remove if seen before
}

$doc->toFile('clarified.xml'); # Print to file

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How can I remove duplicate, invalid, child nodes from an XML document using Linq to XML?

From Dev

How to remove duplicate nodes in linq to xml document

From Dev

How do I remove duplicate terms with scores from a text file?

From Dev

How do I remove duplicate entries from Dash on 18.04?

From Dev

How do I Remove all the namespaces from a dom Document

From Dev

How do I remove a string from an array in a mongodb document?

From Dev

How do I remove the Header and footer "area" from a word document?

From Dev

How do I remove elements from an array in Perl?

From Dev

How Do I Run update-perl-sax-parsers to get XML::LibXML::Document properly installed?

From Dev

How do I remove duplicate quicklist?

From Dev

How Do I Remove Duplicate Words With Suffixes?

From Dev

How do I remove duplicate files in directory?

From Dev

How do I remove an Element from an XML in C#

From Dev

How do I remove an element and its content from an XML file

From Dev

How do I remove all attributes from one XML element?

From Dev

How to remove adjascent duplicate values from perl array

From Dev

How do i get an xml value in perl?

From Dev

How do I remove duplicate dicts (with nested dicts) from a list in Python?

From Dev

How do I remove duplicate words from a list in python without using sets?

From Dev

How do I remove duplicate words from a list in python without using sets?

From Dev

How do I remove a duplicate from a SELECT DISTINCT over multiple columns?

From Dev

How do I remove non-visible elements from an entire document?

From Dev

How to remove the duplicate nodes from xml using xslt

From Dev

Remove all duplicate lines from a text document

From Dev

How to remove duplicate XML declaration

From Dev

To remove duplicate elements from an array in Perl

From Dev

How do I remove non-duplicate lines in Vim?

From Dev

How do I remove duplicate entries in my output file in Python?

From Dev

How do I open a document from R?

Related Related

  1. 1

    How can I remove duplicate, invalid, child nodes from an XML document using Linq to XML?

  2. 2

    How to remove duplicate nodes in linq to xml document

  3. 3

    How do I remove duplicate terms with scores from a text file?

  4. 4

    How do I remove duplicate entries from Dash on 18.04?

  5. 5

    How do I Remove all the namespaces from a dom Document

  6. 6

    How do I remove a string from an array in a mongodb document?

  7. 7

    How do I remove the Header and footer "area" from a word document?

  8. 8

    How do I remove elements from an array in Perl?

  9. 9

    How Do I Run update-perl-sax-parsers to get XML::LibXML::Document properly installed?

  10. 10

    How do I remove duplicate quicklist?

  11. 11

    How Do I Remove Duplicate Words With Suffixes?

  12. 12

    How do I remove duplicate files in directory?

  13. 13

    How do I remove an Element from an XML in C#

  14. 14

    How do I remove an element and its content from an XML file

  15. 15

    How do I remove all attributes from one XML element?

  16. 16

    How to remove adjascent duplicate values from perl array

  17. 17

    How do i get an xml value in perl?

  18. 18

    How do I remove duplicate dicts (with nested dicts) from a list in Python?

  19. 19

    How do I remove duplicate words from a list in python without using sets?

  20. 20

    How do I remove duplicate words from a list in python without using sets?

  21. 21

    How do I remove a duplicate from a SELECT DISTINCT over multiple columns?

  22. 22

    How do I remove non-visible elements from an entire document?

  23. 23

    How to remove the duplicate nodes from xml using xslt

  24. 24

    Remove all duplicate lines from a text document

  25. 25

    How to remove duplicate XML declaration

  26. 26

    To remove duplicate elements from an array in Perl

  27. 27

    How do I remove non-duplicate lines in Vim?

  28. 28

    How do I remove duplicate entries in my output file in Python?

  29. 29

    How do I open a document from R?

HotTag

Archive