在PHP中从XML内部解析HTML标签

debugcn 发表于 Dev

阿德里安

我试图在PHPsimplexml_load_string解析http://uk.news.yahoo.com/rss时使用创建自己的RSS feed（用于学习）。我被困在阅读标签内的HTML标签<description>。

到目前为止，我的代码如下所示：

$feed = file_get_contents('http://uk.news.yahoo.com/rss');
$rss = simplexml_load_string($feed);

//for each element in the feed
foreach ($rss->channel->item as $item) {
    echo '<h3>'. $item->title . '</h3>'; 

        foreach($item->description as $desc){

             //how to read the href from the a tag???

             //this does not work at all
             $tags = $item->xpath('//a');
             foreach ($tags as $tag) {
                 echo $tag['href'];
             }
       }
}

有什么想法如何提取每个HTML标签吗？

谢谢

先生代码

描述内容已对其特殊字符进行了编码，因此它不会被视为XML中的节点，而只是一个字符串。您可以解码特殊字符，然后将HTML加载到DOMDocument中，然后执行您想做的任何事情。例如：

foreach ($rss->channel->item as $item) {
    echo '<h3>'. $item->title . '</h3>'; 

        foreach($item->description as $desc){

            $dom = new DOMDocument();
            $dom->loadHTML(htmlspecialchars_decode((string)$desc));

            $anchors = $dom->getElementsByTagName('a');
            echo $anchors->item(0)->getAttribute('href');
        }
}

XPath也可用于DOMDocument，请参阅DOMXPath。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。