我有以下xml文件,我想读取其中的内容,<seg>
然后使用Python将其保存到纯文本文件中。我使用了DOM模块。
<?xml version="1.0"?>
<mteval>
<tstset setid="default" srclang="any" trglang="TRGLANG" sysid="SYSID">
<doc docid="ntpmt-dev-2000/even1k.cn.seg.txt">
<seg id="1">therefore , can be obtained having excellent properties ( good stability and solubility of the balance of the crystal as a pharmaceutical compound is not possible to predict .</seg>
<seg id="3">compound ( I ) are preferably crystalline , in particular , has good stability and solubility equilibrium and suitable for industrial prepared type A crystal is preferred .</seg>
<seg id="4">method B included in the catalyst such as DMF , and the like in the presence of a compound of formula ( II ) with thionyl chloride or oxalyl chloride to give an acyl chloride , in the presence of a base of the acid chloride with alcohol ( IV ) ( O ) by reaction of esterification .</seg>
</doc>
</tstset>
</mteval>
from xml.dom.minidom import parse
import xml.dom.minidom
dom = xml.dom.minidom.parse(r"path_to_xml file")
file = dom.documentElement
seg = dom.getElementsByTagName("seg")
for item in seg:
sent = item.firstChild.data
print(sent,sep='')
file = open(r'file.txt','w')
file.write(sent)
file.close()
在运行上述代码时,它成功打印了屏幕上的所有行,但是file.txt仅具有最后一行<seg>
(seg id = 4),实际上我想将所有句子保存到文件中。我的代码有问题吗?
您只需要调用file.write(sent)
一次,在循环之前打开文件,然后将以下行添加到此代码中:
file = open(r'file.txt','w')
for item in seg:
sent = item.firstChild.data
print(sent,sep='')
file.write(sent) // <---- this line
file.close()
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句