How to remove nodes from XML file as command line?

Felix

I have an xml file that contains the tag </w:rPr> several times. It is used like this

<w:rPr><w:rFonts w:ascii="Symbol" w:hAnsi="Symbol" w:hint="default"/></w:rPr>

However the content between the tag itself is sometimes different. Could there be a way to use sed or something other to delete everything between <w:rPr> and </w:rPr> and then both tags as well?

The relevant namespace

xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"

And part of the file itself (formatted, valid XML)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:numbering xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid wp14">
  <w:abstractNum w:abstractNumId="0" w15:restartNumberingAfterBreak="0">
    <w:nsid w:val="FFFFFF89"/>
    <w:multiLevelType w:val="singleLevel"/>
    <w:tmpl w:val="CB2CEC0E"/>
    <w:lvl w:ilvl="0">
      <w:start w:val="1"/>
      <w:numFmt w:val="bullet"/>
      <w:pStyle w:val="Aufzhlungszeichen"/>
      <w:lvlText w:val="ï‚·"/>
      <w:lvlJc w:val="left"/>
      <w:pPr>
        <w:tabs>
          <w:tab w:val="num" w:pos="360"/>
        </w:tabs>
        <w:ind w:left="360" w:hanging="360"/>
      </w:pPr>
      <w:rPr>
        <w:rFonts w:ascii="Symbol" w:hAnsi="Symbol" w:hint="default"/>
      </w:rPr>
    </w:lvl>
  </w:abstractNum>

  <!-- ... -->

 <w:abstractNum w:abstractNumId="16" w15:restartNumberingAfterBreak="0">
    <w:nsid w:val="6F8046F9"/>
    <w:multiLevelType w:val="hybridMultilevel"/>
    <w:tmpl w:val="1F3A6CE4"/>
    <w:lvl w:ilvl="0" w:tplc="DE32BBA8">
      <w:start w:val="1"/>
      <w:numFmt w:val="lowerLetter"/>
      <w:lvlText w:val="%1)"/>
      <w:lvlJc w:val="left"/>
      <w:pPr>
        <w:ind w:left="682" w:hanging="567"/>
      </w:pPr>
      <w:rPr>
        <w:rFonts w:ascii="Arial" w:eastAsia="Arial" w:hAnsi="Arial" w:cs="Arial" w:hint="default"/>
        <w:spacing w:val="-1"/>
        <w:w w:val="100"/>
        <w:sz w:val="22"/>
        <w:szCs w:val="22"/>
        <w:lang w:val="de-DE" w:eastAsia="de-DE" w:bidi="de-DE"/>
      </w:rPr>
    </w:lvl>

    <!-- ... -->

    <w:lvl w:ilvl="8" w:tplc="E4341C34">
      <w:numFmt w:val="bullet"/>
      <w:lvlText w:val="•"/>
      <w:lvlJc w:val="left"/>
      <w:pPr>
        <w:ind w:left="7581" w:hanging="567"/>
      </w:pPr>
      <w:rPr>
        <w:rFonts w:hint="default"/>
        <w:lang w:val="de-DE" w:eastAsia="de-DE" w:bidi="de-DE"/>
      </w:rPr>
    </w:lvl>
  </w:abstractNum>

  <!-- ... -->

  <w:num w:numId="1">
    <w:abstractNumId w:val="15"/>
  </w:num>
  <w:num w:numId="2">
    <w:abstractNumId w:val="6"/>
  </w:num>

  <!-- ... -->

</w:numbering>
Gilles Quenot

Sure, it's a task for (a proper XML parser) and his friend , like this:

xmlstarlet ed -L \
              -N w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" \
              -d '//w:rPr' file.xml

A bit of explanations :

  • -L edit the file on the fly like sed -i
  • -N set the XML namespace, if needed
  • -d remove nodes matching xpath expression

Check xmlstarlet edit --help

TL;DR

please, never ever use for this task !

Everytime you use sed for html or xml, you kill a kitty

theory :

According to the compiling theory, XML/HTML can't be parsed using regex based on finite state machine. Due to hierarchical construction of XML/HTML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.

realLife©®™ everyday tool in a :

You can use one of the following :

xmllint often installed by default with libxml2, xpath1

xmlstarlet can edit, select, transform... Not installed by default, xpath1

xpath installed via perl's module XML::XPath, xpath1

xidel xpath3

saxon-lint my own project, wrapper over @Michael Kay's Saxon-HE Java library, xpath3

or you can use high level languages and proper libs, I think of :

's lxml (from lxml import etree)

's XML::LibXML, XML::XPath, XML::Twig::XPath, HTML::TreeBuilder::XPath

, check this example

DOMXpath, check this example


Check: Using regular expressions with HTML tags

enter image description here

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How to remove nodes from an XML file

From Dev

How to remove nodes from a HUGE (>2gb) XML file?

From Dev

remove nodes listed into XmlNodeList from XML file

From Dev

How to remove characters from file names using command line?

From Dev

How can I parse an XML file from the command line (for GeekTool)?

From Dev

How to remove line `v u` from a file when line `u v` already exists using unix command

From Dev

How to remove the searching PID from this command line?

From Dev

How can I add line breaks in an XML file from the Unix command line?

From Dev

Remove namespaces from xml nodes

From Dev

Java remove nodes from XML

From Java

How to pretty print XML from the command line?

From Dev

How to remove similar consecutive nodes from XML using c#?

From Dev

How to remove the duplicate nodes from xml using xslt

From Dev

How to use Ruby's command line in-place-edit mode to remove lines from a text file

From Dev

How to remove all comments from a javascript file using common Linux command line tools?

From Dev

How to pass XML file content into java command as command line argument?

From Dev

How to pass XML file content into java command as command line argument?

From Dev

How to remove entire line from file?

From Dev

How to remove one line from a txt file

From Dev

How to remove line from file in LISP?

From Dev

How to remove a line from a text file?

From Dev

Remove line from file in bash script using sed command

From Dev

command line batch remove _ from its file name

From Dev

how to remove a line of file from an other line in the same file?

From Dev

How I can run NUnit from command line and get xml result file?

From Dev

How I can run NUnit from command line and get xml result file?

From Dev

How to select a column from a file with command line

From Dev

How to take command line parameters from a file

From Dev

How to clear the contents of a file from the command line?

Related Related

  1. 1

    How to remove nodes from an XML file

  2. 2

    How to remove nodes from a HUGE (>2gb) XML file?

  3. 3

    remove nodes listed into XmlNodeList from XML file

  4. 4

    How to remove characters from file names using command line?

  5. 5

    How can I parse an XML file from the command line (for GeekTool)?

  6. 6

    How to remove line `v u` from a file when line `u v` already exists using unix command

  7. 7

    How to remove the searching PID from this command line?

  8. 8

    How can I add line breaks in an XML file from the Unix command line?

  9. 9

    Remove namespaces from xml nodes

  10. 10

    Java remove nodes from XML

  11. 11

    How to pretty print XML from the command line?

  12. 12

    How to remove similar consecutive nodes from XML using c#?

  13. 13

    How to remove the duplicate nodes from xml using xslt

  14. 14

    How to use Ruby's command line in-place-edit mode to remove lines from a text file

  15. 15

    How to remove all comments from a javascript file using common Linux command line tools?

  16. 16

    How to pass XML file content into java command as command line argument?

  17. 17

    How to pass XML file content into java command as command line argument?

  18. 18

    How to remove entire line from file?

  19. 19

    How to remove one line from a txt file

  20. 20

    How to remove line from file in LISP?

  21. 21

    How to remove a line from a text file?

  22. 22

    Remove line from file in bash script using sed command

  23. 23

    command line batch remove _ from its file name

  24. 24

    how to remove a line of file from an other line in the same file?

  25. 25

    How I can run NUnit from command line and get xml result file?

  26. 26

    How I can run NUnit from command line and get xml result file?

  27. 27

    How to select a column from a file with command line

  28. 28

    How to take command line parameters from a file

  29. 29

    How to clear the contents of a file from the command line?

HotTag

Archive