使用Perl正则表达式将XML内容与字符串分开

debugcn 发表于 Dev

苏舍尔·辛格（Susheel singh）

我有下面的字符串，其中包含用（\ n）分隔的文本。我想使用正则表达式匹配xml内容，删除所有空格和\ n并将其转换为单行。我使用以下正则表达式：

my $string = "this contains the text which I pasted below in before section";
$string=~ m/(^.*)(<[a-zA-Z]*>)/;
$extractedXml = $2;

为什么上面的代码无法获取XML内容？

前：

G11N/Locale=en_USY:/default/main/test1/test/test2/test4/test5/default.site
G11N/Localizable=true
TeamSite/Assocation/Version=1
TeamSite/LiveSite/DeploymentAudit=<?xml version="1.0" encoding="UTF-8"?>
<Deployments>
    <test>hello</test>
</Deployments>

后：

Y:/default/main/test1/test/test2/test4/test5/default.site
G11N/Locale=en_US
G11N/Localizable=true
TeamSite/Assocation/Version=1
TeamSite/LiveSite/DeploymentAudit=<?xml version="1.0" encoding="UTF-8"?><Deployments><test>hello</test></Deployments>

http://regex101.com/r/zZ0wB8
您可以检查它是否在这里工作，但在实际代码中，它仅与第一行匹配。

津巴布韦

对于您的示例，以下解决方案有效：

my $string = <<"FOO";
G11N/Locale=en_USY:/default/main/test1/test/test2/test4/test5/default.site
G11N/Localizable=true
TeamSite/Assocation/Version=1
TeamSite/LiveSite/DeploymentAudit=<?xml version="1.0" encoding="UTF-8"?>
<Deployments>
    <test>hello</test>
</Deployments>";
FOO

$string =~ s/^\s+(<.+$)/$1/gm;
$string =~ s/>\n/>/gm;

print $string;

它将首先从以xml标记和空白开头的行中删除空格，然后在以xml标记结尾的任何行末尾删除换行符。

这是一种非常实用的方法，很可能在所有情况下都行不通。由于的缘故，它仅适用于unix文件\n。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。