XML 파일에서 중복 된 요소를 삭제하는 방법

debugcn 에 게시 Dev

Jeanne

다음은 내 XML 파일입니다. 중복 된 요소가 포함되어 있습니다 <houseNum>0</houseNum>.

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfHouse>
<XmlForm>
<houseNum>0</houseNum>
 <plan1> 
  <coord>
    <X> 1.2  </X>
    <Y> 2.1  </Y>
    <Z> 3.0  </Z>
  </coord>
  <color> 
    <R> 255 </R>
    <G> 0   </G>
    <B> 0   </B>
  </color>
 </plan1>
 <plan2>
  <coord>  
    <X> 21.2  </X>
    <Y> 22.1  </Y>
    <Z> 31.0  </Z>
  </coord>
  <color> 
    <R> 255 </R>
    <G> 0   </G>
    <B> 0   </B>
</color>
 </plan2> 
</XmlForm>
<XmlForm>
<houseNum>0</houseNum>
 <plan1> 
  <coord>
    <X> 1.2  </X>
    <Y> 2.1  </Y>
    <Z> 3.0  </Z>
  </coord>
  <color> 
    <R> 255 </R>
    <G> 0   </G>
    <B> 0   </B>
  </color>
 </plan1>
 <plan2>
  <coord>  
    <X> 21.2  </X>
    <Y> 22.1  </Y>
    <Z> 31.0  </Z>
  </coord>
  <color> 
    <R> 255 </R>
    <G> 0   </G>
    <B> 0   </B>
</color>
 </plan2> 
</XmlForm>

<XmlForm>
<houseNum>1</houseNum>
 <plan1> 
  <coord>
    <X> 11.2  </X>
    <Y> 12.1  </Y>
    <Z> 13.0  </Z>
  </coord>
  <color> 
    <R> 255 </R>
    <G> 255   </G>
    <B> 0   </B>
  </color>
 </plan1>
 <plan2>
  <coord>  
    <X> 211.2  </X>
    <Y> 212.1  </Y>
    <Z> 311.0  </Z>
  </coord>
  <color> 
    <R> 255 </R>
    <G> 0   </G>
    <B> 255   </B>
</color>
 </plan2> 
</XmlForm>
</ArrayOfHouse>

제 경우에는 두 가지 유형의 중복이 있습니다.

1) 중복 된 요소가 연속적인 경우 중복 된 요소를 제거하는 코드는 다음과 같습니다. 요소 [i]와 요소 [i + 1]이 elemet [i] .text == element [i + 1 인 경우 비교합니다. ] .text, 요소 삭제 [i + 1]

from lxml import etree
def Remove_Duplication_XML(xml_file):
    base_name = os.path.basename(xml_file)
    start_time = time.time()
    tree = etree.parse(xml_file)

    # remove duplicate skeletons
    root = tree.getroot()
    elementlist = [e for e in root.iter('houseNum')]
    numframes=[x.text for x in elementlist]
    print(numframes)
    for index_element in range(1, len(elementlist)):

        try:
            if elementlist[index_element].text == elementlist[index_element - 1].text:
                elementlist[index_element].getparent().remove(elementlist[index_element])
                print(elementlist[index_element].text)

        except:
            print(' except  ')

    # String xml without duplication
    file = etree.tostring(root).decode("utf-8")
    print(file)

2) 중복 된 요소가 연속적이지 않다면, 그것을 할 일을 찾고 있습니다. 도움이 필요하세요?

완전한

XML 파일을 변환하도록 설계된 특수 목적 언어 인 XSLT를 고려하십시오 (SQL을 사용하는 것과 유사하며 데이터베이스를 쿼리하는데도 특수 목적). 그리고 이미 Python을 사용하고 있기 때문에 lxml단일 for루프 나 if로직 없이도 이러한 스크립트를 원활하게 실행 하여 문서의 어느 곳에서나 중복을 제거 할 수 있습니다.

특히, 실행 Muenchian 그룹화 에 의해 색인에, 당신의 XML 문서를,는 XSLT 1.0 방법을 houseNum 사용 <xsl:key>하고 별개의 그룹을 반환합니다. 추가 보너스로 XSLT 아래에서는 예쁜 인쇄 들여 쓰기로 텍스트 노드의 공백도 제거합니다.

XSLT (특수 .xml 파일 인 .xsl 파일로 저장)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" method="xml"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="id" match="XmlForm" use="houseNum" />

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="XmlForm[generate-id() != generate-id(key('id', houseNum))]"/>

  <xsl:template match="text()">
    <xsl:value-of select="normalize-space(.)"/>
  </xsl:template>

</xsl:stylesheet>

파이썬

import os
import lxml.etree as et

# LOAD XML AND XSL FILES
xml = et.parse('Source.xml')
xsl = et.parse('XSLTScript.xsl')

# TRANSFORM SOURCE
transform = et.XSLT(xsl)
result = transform(xml)

# PRINT RESULT TO SCREEN
print(result)

# SAVE RESULT TO FILE
with open('Output.xml', 'wb') as f:
    f.write(result)

출력 (알림 텍스트 값이 빈 공간으로 잘림)

<?xml version="1.0"?>
<ArrayOfHouse>
  <XmlForm>
    <houseNum>0</houseNum>
    <plan1>
      <coord>
        <X>1.2</X>
        <Y>2.1</Y>
        <Z>3.0</Z>
      </coord>
      <color>
        <R>255</R>
        <G>0</G>
        <B>0</B>
      </color>
    </plan1>
    <plan2>
      <coord>
        <X>21.2</X>
        <Y>22.1</Y>
        <Z>31.0</Z>
      </coord>
      <color>
        <R>255</R>
        <G>0</G>
        <B>0</B>
      </color>
    </plan2>
  </XmlForm>
  <XmlForm>
    <houseNum>1</houseNum>
    <plan1>
      <coord>
        <X>11.2</X>
        <Y>12.1</Y>
        <Z>13.0</Z>
      </coord>
      <color>
        <R>255</R>
        <G>255</G>
        <B>0</B>
      </color>
    </plan1>
    <plan2>
      <coord>
        <X>211.2</X>
        <Y>212.1</Y>
        <Z>311.0</Z>
      </coord>
      <color>
        <R>255</R>
        <G>0</G>
        <B>255</B>
      </color>
    </plan2>
  </XmlForm>
</ArrayOfHouse>

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-06-20

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사

XML 파일에서 중복 된 요소를 삭제하는 방법

XML 파일에서 중복 된 요소를 삭제하는 방법

XML 파일에서 중첩 된 요소를 제거하는 방법

Qt를 사용하여 중첩 된 xml 요소를 삭제하는 방법

Qt를 사용하여 중첩 된 xml 요소를 삭제하는 방법

xml에서 중복 필드를 삭제하는 방법

복제 중에 삭제 된 파일을 복원하는 방법

중복 된 포함 문서를 삭제하는 방법

C #의 txt 파일에서 XML 요소를 삭제하는 방법

중첩 된 json 요소를 변수로 삭제하는 방법

중첩 된 목록에서 두 번째 발생하는 요소를 삭제하는 방법-PROLOG

XSLT를 사용하여 XML 파일에서 중복 값 삭제

복제 된 저장소에서 파일을 삭제하지 않고 git 저장소에서 파일을 제거하는 방법

Vim에서 삭제 된 텍스트를 복원하는 방법

중복 된 날짜를 삭제하는 방법

알 수없는 양의 다른 요소 내에 중첩 된 XML 요소를 참조하는 방법

디스크 SourceTree에서 삭제 된 저장소를 복원하는 방법

중첩 된 xml 요소를 추가하는 방법

중첩 된 xml 요소를 추가하는 방법

삭제 된 파티션에서 파일을 복구하는 방법

Codeigniter : 인덱스 이름으로 지정된 배열에서 중복 요소를 제거하는 방법

QStringList에서 중복 요소를 제거하는 방법

행에서 중복 번호를 삭제하는 방법?

삭제 된 저장소 및 삭제 된 브랜치를 복구하는 방법

로컬 저장소에 저장된 배열 요소를 삭제하는 방법

중첩 된 XML 요소를 플랫 XML로 변환하는 방법

XML 파일에서 요소와 해당 컨텐츠를 제거하는 방법

다차원 배열에서 중복 요소를 삭제하는 방법은 무엇입니까?

DataFrame에서 중복 항목을 삭제하는 벡터화 된 방법

Powershell-XML 파일에서 요소를 읽는 방법

Powershell-XML 파일에서 요소를 읽는 방법