在Python中,如何使用xml.etree.ElementTree创建数据框?

雷纳尔多·查韦斯

在Python 3中,我有一个XML文件,我想将其转换为pandas dataframe

为此,我想到了使用xml.etree.ElementTree Python模块

XML具有以下结构:

<?xml version="1.0" encoding="UTF-8"?>
<estacao_rd><row item="1" SiglaServico="TV" id="57dbaad053c60" state="TV-C1" entidade="X-MEDIAGROUP S.A." fistel="50410887137" cnpj="0321181400
0163" CodMunicipio="1200336" municipio="Mâncio Lima" uf="AC"><entidade entidade_nome_entidade="X-MEDIAGROUP S.A." entidade_nome_fantasia="" ent
idade_ddd="" entidade_telefone="" entidade_email="" entidade_codtipotaxa="" entidade_tipotaxa="" entidade_idttipoorgao="" entidade_tipoorgao=""
 localEspecifico="" responsavel_legal_cpf="" responsavel_legal_nome_completo="" responsavel_legal_email="" num_servico="248" srd_planobasico_in
d_carater="P" num_fistel="50410887137" habilitacao_NumScradJur="16082" habilitacao_NumScradTec="16081" habilitacao_DataLimiteInstalacao="" habi
litacao_DataValFreq="" habilitacao_DataContrato="" habilitacao_IndEducativo="0" network="" carater="P" fase="1" canalCidadania="N" codFinalidad
e="0" finalidade="Comercial"><observacao>RESOLUCAO 291/02;Ato nº 2.720, de 29/04/2011, publicado no DOU. de 03/05/2011.</observacao></entidade>
<administrativo numero_estacao="" NomeIndicativo="" FindCodSituacaoLicenca="" FindDatalimiteInstalacao="" DataEmissaoLicenca="" DataReemissaoLi
cenca="" NumLicenca=""><aprovacao_locais seq="1" NumProcesso="" NumDocumento="" IdtTipoDocumento="" TipoDocumento="" CodOrgao="" DataDocumento=
"" DataDOU="" IdtRazao="" Razao="" IndNatureza="" Natureza=""/><historico_documentos seq="1" NumProcesso="9999" NumDocumento="123" IdtTipoDocum
ento="18" TipoDocumento="Despacho" CodOrgao="CMPRL" DataDocumento="20/08/2012" DataDOU="22/08/2012" IdtRazao="2" Razao="Consol. Carac. Técnicas
" IndNatureza="Técnico" Natureza="Técnico"/><historico_documentos seq="2" NumProcesso="9999" NumDocumento="245" IdtTipoDocumento="3" TipoDocume
nto="Decreto Legislativo" CodOrgao="CN" DataDocumento="20/10/2015" DataDOU="21/10/2015" IdtRazao="7" Razao="Deliber.  do C. Nacional" IndNature
za="Jurídico" Natureza="Jurídico"/></administrativo><enderecos><estacao estacao_EndLogradouro="" estacao_EndBairro="" estacao_CodMunicipio="" e
stacao_NomeMunicipio="" estacao_SiglaUF="" estacao_CodCep=""/><estacaoprincipal estacaoprincipal_EndLogradouro="" estacaoprincipal_EndBairro=""
 estacaoprincipal_CodMunicipio="" estacaoprincipal_NomeMunicipio="" estacaoprincipal_SiglaUF="" estacaoprincipal_CodCep=""/><estacaoauxiliar es
tacaoauxiliar_EndLogradouro="" estacaoauxiliar_EndBairro="" estacaoauxiliar_CodMunicipio="" estacaoauxiliar_NomeMunicipio="" estacaoauxiliar_Si
glaUF="" estacaoauxiliar_CodCep=""/></enderecos><estacao_principal transmissor_CodHomologacao="" transmissor_fabricante="" transmissor_Modelo="
" transmissor_Potencia_Operacao="" linhaTransmissao_Principal_Modelo="" linhaTransmissao_Principal_Fabricante="" linhaTransmissao_Principal_Med
Comprimento="" linhaTransmissao_Principal_MedAtenLinhaTransmissaodB100m="" linhaTransmissao_Principal_PerdasAcessorias_db="0.5" linhaTransmissa
o_Principal_MedImpedanciaCarac="" antena_Principal_Fabricante="" antena_Principal_Modelo="" antena_Principal_Ganho_dBd="" antena_Principal_Beam
Tilt="" antena_Principal_OrientacaoNV="" antena_Principal_Polarizacao="" antena_Principal_HCI="" antena_Principal_Nulos="" antena_Principal_Obs
ervacao="" ERP="" latitude="" longitude=""/><estacao_auxiliar/><transmissor_auxiliar transmissoraux_CodHomologacao="" transmissoraux_fabricante
="" transmissoraux_Modelo="" transmissoraux_Potencia_Operacao=""/><transmissor_auxiliar2 transmissoraux2_CodHomologacao="" transmissoraux2_fabr
icante="" transmissoraux2_Modelo="" transmissoraux2_Potencia_Operacao=""/><linha_auxiliar linhaTransmissao_Auxiliar_Modelo="" linhaTransmissao_
Auxiliar_Fabricante="" linhaTransmissao_Auxiliar_MedComprimento="" linhaTransmissao_Auxiliar_MedAtenLinhaTransmissaodB100m="" linhaTransmissao_
Auxiliar_PerdasAcessorias_db="" linhaTransmissao_Auxiliar_MedImpedanciaCarac=""/><antena_auxiliar antena_Auxiliar_Fabricante="" antena_Auxiliar
_Modelo="" antena_Auxiliar_Ganho_dBd="" antena_Auxiliar_BeamTilt="" antena_Auxiliar_OrientacaoNV="" antena_Auxiliar_Polarizacao="" antena_Auxil
iar_HCI="" antena_Auxiliar_Nulos="" antena_Auxiliar_Observacao=""/><horario_funcionamento><item seq="0" dia_inicio="" dia_fim="" hora_inicio=""
 hora_fim=""/></horario_funcionamento></row>...

它是关于电视和广播频道的XML,其中包含每个广播公司的名称和位置

我为数据框列选择了一些信息,并通过以下方式进行了转换:

import xml.etree.ElementTree as et 
import pandas as pd

xtree = et.parse("concessoes/dez_2019/estacao_rd.xml")
xroot = xtree.getroot()

df_cols = ["row item", 
           "SiglaServico", 
           "id", 
           "state", 
           "entidade", 
           "fistel", 
           "cnpj", 
           "CodMunicipio", 
           "municipio", 
           "uf", 
           "responsavel_legal_cpf", 
           "responsavel_legal_nome_completo", 
           "entidade_nome_fantasia",
           "finalidade"]
rows = []

for node in xroot: 
    s_row_item = node.attrib.get("row item")
    s_SiglaServico = node.attrib.get("SiglaServico")
    s_id = node.attrib.get("id")
    s_state = node.attrib.get("state")
    s_entidade = node.attrib.get("entidade")
    s_fistel = node.attrib.get("fistel")
    s_cnpj = node.attrib.get("cnpj")
    s_CodMunicipio = node.attrib.get("CodMunicipio")
    s_municipio = node.attrib.get("municipio")
    s_uf = node.attrib.get("uf")
    s_responsavel_legal_cpf = node.attrib.get("responsavel_legal_cpf")
    s_responsavel_legal_nome_completo = node.attrib.get("responsavel_legal_nome_completo")
    s_entidade_nome_fantasia = node.attrib.get("entidade_nome_fantasia")
    s_finalidade = node.attrib.get("finalidade")


    rows.append({"row_item": s_row_item, 
                 "SiglaServico": s_SiglaServico, 
                 "id": s_id, 
                 "state": s_state,
                 "entidade": s_entidade,
                 "fistel": s_fistel,
                 "cnpj": s_cnpj,
                 "CodMunicipio": s_CodMunicipio,
                 "municipio": s_municipio,
                 "uf": s_uf,
                 "responsavel_legal_cpf": s_responsavel_legal_cpf,
                 "responsavel_legal_nome_completo": s_responsavel_legal_nome_completo,
                 "entidade_nome_fantasia": s_entidade_nome_fantasia,
                 "finalidade": s_finalidade,
                })

out_df = pd.DataFrame(rows, columns = df_cols)

转换有效,但某些列为空

out_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27956 entries, 0 to 27955
Data columns (total 14 columns):
row item                           0 non-null float64
SiglaServico                       27956 non-null object
id                                 27956 non-null object
state                              27956 non-null object
entidade                           27956 non-null object
fistel                             27956 non-null object
cnpj                               27956 non-null object
CodMunicipio                       27956 non-null object
municipio                          27956 non-null object
uf                                 27956 non-null object
responsavel_legal_cpf              0 non-null object
responsavel_legal_nome_completo    0 non-null object
entidade_nome_fantasia             0 non-null object
finalidade                         0 non-null object
dtypes: float64(1), object(13)
memory usage: 3.0+ MB

XML文件可以在此处下载

请问有人知道为什么有些列用这种方法会空白吗?以及如何解决这个问题?

年轻的萨拉萨尔

那是因为您在ROWS内有子元素。我在NODE中添加了另一个循环,导出到csv

import xml.etree.ElementTree as et
import pandas as pd

xtree = et.parse("estacao_rd.xml")
xroot = xtree.getroot()

df_cols = ["item",
           "SiglaServico",
           "id",
           "state",
           "entidade",
           "fistel",
           "cnpj",
           "CodMunicipio",
           "municipio",
           "uf",
           "responsavel_legal_cpf",
           "responsavel_legal_nome_completo",
           "entidade_nome_fantasia",
           "finalidade"]
rows = []

for node in xroot:

    s_row_item = node.attrib.get("item")
    s_SiglaServico = node.attrib.get("SiglaServico")
    s_id = node.attrib.get("id")
    s_state = node.attrib.get("state")
    s_entidade = node.attrib.get("entidade")
    s_fistel = node.attrib.get("fistel")
    s_cnpj = node.attrib.get("cnpj")
    s_CodMunicipio = node.attrib.get("CodMunicipio")
    s_municipio = node.attrib.get("municipio")
    s_uf = node.attrib.get("uf")
    for child in node.iter():
        s_responsavel_legal_cpf = child.attrib.get("responsavel_legal_cpf")
        s_responsavel_legal_nome_completo = child.attrib.get("responsavel_legal_nome_completo")
        s_entidade_nome_fantasia = child.attrib.get("entidade_nome_fantasia")
        s_finalidade = child.attrib.get("finalidade")
        rows.append({"item": s_row_item,
                     "SiglaServico": s_SiglaServico,
                     "id": s_id,
                     "state": s_state,
                     "entidade": s_entidade,
                     "fistel": s_fistel,
                     "cnpj": s_cnpj,
                     "CodMunicipio": s_CodMunicipio,
                     "municipio": s_municipio,
                     "uf": s_uf,
                     "responsavel_legal_cpf": s_responsavel_legal_cpf,
                     "responsavel_legal_nome_completo": s_responsavel_legal_nome_completo,
                     "entidade_nome_fantasia": s_entidade_nome_fantasia,
                     "finalidade": s_finalidade,
                    })
#
out_df = pd.DataFrame(rows, columns = df_cols)
#Export to csv
out_df.to_csv('test.csv',index=False)

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

从Python中的XML数据创建字典(使用xml.etree.ElementTree)

来自分类Dev

如何使用 python 从soap响应中的多个命名空间获取数据:xml.etree.ElementTree

来自分类Dev

xml.etree.ElementTree getElementByID()?

来自分类Dev

xml.etree.ElementTree findall

来自分类Dev

xml.etree.ElementTree findall

来自分类Dev

Python xml.etree.ElementTree 问题

来自分类Dev

python(xml.etree.ElementTree)中的XML解析

来自分类Dev

使用 Python xml.etree.ElementTree 遍历 XML 树的问题

来自分类Dev

如何使用xml.etree.ElementTree访问标签之间的文本

来自分类Dev

如何使用xml.etree.ElementTree访问标签之间的文本

来自分类Dev

将lxml.etree._ElementTree对象存储在数据帧中:TypeError:无法腌制lxml.etree._ElementTree对象

来自分类Dev

xml.etree.ElementTree-麻烦设置xmlns ='...'

来自分类Dev

如何在python xml.etree.ElementTree中的迭代器中删除节点

来自分类Dev

如何使用xml.etree.ElementTree有条件地将属性插入Python中的节点

来自分类Dev

如何使用xml.etree.ElementTree Python格式化属性,前缀和标签

来自分类Dev

如何使用python(xml.etree.ElementTree)解决下一个迭代?

来自分类Dev

使用xml.etree.ElementTree XML解析子元素

来自分类Dev

使用xml.etree.ElementTree搜索XML元素树的属性

来自分类Dev

python xml.etree.ElementTree追加到子元素

来自分类Dev

xml.etree.ElementTree iterparse()仍在使用大量内存?

来自分类Dev

使用html5lib和xml.etree.ElementTree

来自分类Dev

无法使用xml.etree.ElementTree解析html

来自分类Dev

使用xml.etree.ElementTree(python)解析XML时,如何区分常规空格和转义空格(&#32;)

来自分类Dev

如何使用带有 lxml 和 python 的预先存在的 etree 元素创建 xml 文档?

来自分类Dev

使用Python模块xml.etree.ElementTree解析有点复杂的XML并将值存储在List中

来自分类Dev

如何使用 xml.etree.ElementTree 在 xml 文件上添加子子项

来自分类Dev

如何使用 xml.etree.ElementTree 解析 XML 文件,其子项中有 HTML 内容

来自分类Dev

使用xml.etree.ElementTree的Python xml解析不起作用

来自分类Dev

使用xml.etree.ElementTree获取文件中的XML标签列表

Related 相关文章

  1. 1

    从Python中的XML数据创建字典(使用xml.etree.ElementTree)

  2. 2

    如何使用 python 从soap响应中的多个命名空间获取数据:xml.etree.ElementTree

  3. 3

    xml.etree.ElementTree getElementByID()?

  4. 4

    xml.etree.ElementTree findall

  5. 5

    xml.etree.ElementTree findall

  6. 6

    Python xml.etree.ElementTree 问题

  7. 7

    python(xml.etree.ElementTree)中的XML解析

  8. 8

    使用 Python xml.etree.ElementTree 遍历 XML 树的问题

  9. 9

    如何使用xml.etree.ElementTree访问标签之间的文本

  10. 10

    如何使用xml.etree.ElementTree访问标签之间的文本

  11. 11

    将lxml.etree._ElementTree对象存储在数据帧中:TypeError:无法腌制lxml.etree._ElementTree对象

  12. 12

    xml.etree.ElementTree-麻烦设置xmlns ='...'

  13. 13

    如何在python xml.etree.ElementTree中的迭代器中删除节点

  14. 14

    如何使用xml.etree.ElementTree有条件地将属性插入Python中的节点

  15. 15

    如何使用xml.etree.ElementTree Python格式化属性,前缀和标签

  16. 16

    如何使用python(xml.etree.ElementTree)解决下一个迭代?

  17. 17

    使用xml.etree.ElementTree XML解析子元素

  18. 18

    使用xml.etree.ElementTree搜索XML元素树的属性

  19. 19

    python xml.etree.ElementTree追加到子元素

  20. 20

    xml.etree.ElementTree iterparse()仍在使用大量内存?

  21. 21

    使用html5lib和xml.etree.ElementTree

  22. 22

    无法使用xml.etree.ElementTree解析html

  23. 23

    使用xml.etree.ElementTree(python)解析XML时,如何区分常规空格和转义空格(&#32;)

  24. 24

    如何使用带有 lxml 和 python 的预先存在的 etree 元素创建 xml 文档?

  25. 25

    使用Python模块xml.etree.ElementTree解析有点复杂的XML并将值存储在List中

  26. 26

    如何使用 xml.etree.ElementTree 在 xml 文件上添加子子项

  27. 27

    如何使用 xml.etree.ElementTree 解析 XML 文件,其子项中有 HTML 内容

  28. 28

    使用xml.etree.ElementTree的Python xml解析不起作用

  29. 29

    使用xml.etree.ElementTree获取文件中的XML标签列表

热门标签

归档