我有一个长的XML文档,其结构如下:
<carrierData>
<inspections>
<inspection inspection_date="2013-01-16" report_state="TX" report_number="TX130G0ELJ05" level="1" time_weight="1">
<drivers>
<driver driver_type="Primary Driver" first_name="JOHN" last_name="SMITH" date_of_birth="1962-11-20" license_state="TX" License_number="12345678"/>
<driver driver_type="CoDriver"/>
</drivers>
<vehicles>
<vehicle unit="1" vehicle_id_number="2HSCAAXN02C039269" unit_type="Truck Tractor" license_state="TX" license_number="1B13577"/>
<vehicle unit="2" vehicle_id_number="1GRAA76228S702393" unit_type="Semi-Trailer" license_state="TX" license_number="X99757"/>
</vehicles>
<violations>
<violation code="393.11" description="No/defective lighting devices/reflective devices/projected" oos="N" time_severity_weight="3" BASIC="Vehicle Maint."/>
<violation code="393.53(b)" description="Automatic brake adjuster CMV manufactured on or after 10/20/1994 - air brake" oos="N" time_severity_weight="4" BASIC="Vehicle Maint."/>
<violation code="393.47(e)" description="Clamp/Roto-Chamber type brake(s) out of adjustment" oos="N" time_severity_weight="4" BASIC="Vehicle Maint."/>
<violation code="396.3(a)(1)" description="Inspection/repair and maintenance parts and accessories" oos="N" time_severity_weight="2" BASIC="Vehicle Maint."/>
</violations>
</inspection>
我需要遍历检查报告编号列表,并打印与列表中每个编号关联的每个驱动程序的名字和姓氏。我正在使用Python的ElementTree解析XML,尽管下面的代码没有收到错误,但它也没有给我任何结果:
import xml.etree.ElementTree as ET
codes = ['TX3YZ8HQE1X1', 'TX3YAEHQE15W', 'KS00YQ008857', 'TX43D99DAN33', 'NM3267100378',
'COPF31000853', 'TX3ZYF0MUQ6F', 'TX3ZFC0MHXLU', 'TX3Z760MGU0H', 'TX3YGG0MUQ1R',
'TX3YBD0MUI0A', 'TX3XPF0MKQYG', 'TX3X8F0MHXA7', 'AZ0160001581', 'TX3WC40ADYGZ',
'ID6300005350', 'TX3VV50ADUOI', 'TX137S0ELO02', 'UTCE03208119', 'UTCE03208119',
'TX3UTG0MJKDL', 'TX3UD60MIJU5', 'TX13690EBI05', 'TX3U4E0AFA94', 'TX3U4E0AFA94',
'TX3T5F0MIJMH', 'TX13550BKL02', 'TX3SLE0MIJGZ', 'TX3SLE0MIJGZ', 'TX3S8D0AFH3D',
'UTCE03207947', 'TX133Q0ENG01', 'TX133Q0ENG01', 'TX133Q0ENG01', 'TX3REM0MHEK3',
'ID0000169042', 'COPF05000200', 'TX13280EPV0B', 'TX131S9DAB02', 'CO1E19000017',
'TX3PD60WAA4L', 'TX1317W1NW07', 'CO2D02000044', 'LALAEQ001266', 'TX130H0EBT06',
'TX3NW10ABLMK', 'NV7233010192', 'NV4045000998', 'CO3301000406', 'CO5C01000218',
'TX12949DBU03', 'FL1619000314', 'TX12929DIE02', 'TX128X0AAP01', 'TX128A9DHA07',
'CO2B01000061', 'TX1274W1DV01', 'TX126Z9DCM01', 'TX127U9DBV01', 'TX127U9DBV01',
'TX127R9DIZ02', 'TX127K9DCQ06', 'AZ0YDG000141', 'NV7196001031', 'TX126B0FJZ01',
'TX126I9DAN01', 'LALACV003777', 'CO2B12000014', 'TX12650HTB01', 'ID0000220955']
tree = ET.parse("C:\All_BASICs_07-25-2014.xml")
root = tree.getroot()
for x in codes:
for node in tree.iter('inspection'):
if ['report_id'] == [x]:
name = node.attrib.get('first_name','last_name')
print name
我是编程新手,所以这里我可能会遗漏一些显而易见的东西,但是如果没有错误引用,我很难找到问题所在。
您这条线是什么意思?
if ['report_id'] == [x]:
使用此代码,您正在测试['report_id'] == ['TX3YZ8HQE1X1']
,['report_id'] == ['TX3YAEHQE15W']
等等,这将永远是不正确的。这就是为什么您的代码不打印任何内容或给出错误而退出的原因。
report_id
您发布的XML中没有任何名称,是什么意思report_number
?
如果你想抓住主要的驾驶员的名字对所有report_number
的在codes
列表中,尝试是这样的:
for x in codes:
for node in tree.iter('inspection'):
if node.attrib['report_number'] == x:
primary_driver = [d for d in node.iter('driver') if d.attrib['driver_type'] == "Primary Driver"]
primary_driver = primary_driver[0]
first_name = primary_driver.attrib['first_name']
last_name = primary_driver.attrib['last_name']
print first_name, last_name
但是,此代码存在一个性能问题。您正在为中的每个代码遍历整个XML文档codes
。这种具有复杂性 O(number_of_codes * number_of_records)
是O(N**2)
。您可以分O(N)
步执行此操作,而无需循环遍历文档并使用一组来确定是否应包含记录。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句