I can't find out how to solve this
<div>
<p id="p1"> Price is <span>$ 25</span></p>
<p id='p2'> But this price is $ <span id="s1">50,23</span> </p>
<p id='p3'> This one : $ 14540.12 dollar</p>
</div>
What i'm trying to do is find an element with a price in it and it's shortest path to it. This is what i have sofar.
$elements = $dom->getElementsByTagName('*');
foreach($elements as $child)
{
if (preg_match("/.$regex./",$child->nodeValue)){
echo $child->getNodePath(). "<br />";
}
}
This results in
/html
/html/body
/html/body/div
/html/body/div/p[1]
/html/body/div/p[1]/span
/html/body/div/p[2]
/html/body/div/p[2]/span
/html/body/div/p[3]
These are the paths to the elements i want, so that's OK in this test HTML. But in real webpages these path's get very long and are error prone. What i'd like to do is find the closest element with an ID attribute and refer to that.
So once found and element that matched the $regex, I need to travel up the DOM and find the first element with and ID attribute and create the new shorter path from that. In the HTML example above, there are 3 prices matching the $regex. The prices are in:
//p[@id="p1"]/span
//p[@id="s1"]
//p[@id="p3"]
So that is what i'd like to have returned from my function. The means I also need to get rid of all the other paths that exist, because they don't contain $regex
Any help on this?
You could use XPath to follow the ancestor-path to the first node containing an @id
attribute and then cut its path off. Did not clean up the code, but something like this:
// snip
$xpath = new DomXPath($doc);
foreach($elements as $child)
{
$textValue = '';
foreach ($xpath->query('text()', $child) as $text)
$textValue .= $text->nodeValue;
if (preg_match("/.$regex./", $textValue)) {
$path = $child->getNodePath();
$id = $xpath->query('ancestor-or-self::*[@id][1]', $child)->item(0);
$idpath = '';
if ($id) {
$idpath = $id->getNodePath();
$path = '//'.$id->nodeName.'[@id="'.$id->attributes->getNamedItem('id')->value.'"]'.substr($path, strlen($idpath));
}
echo $path."\n";
}
}
Printing something like
/html
/html/body
/html/body/div
//p[@id="p1"]
//p[@id="p1"]/span
//p[@id="p2"]
//span[@id="s1"]
//p[@id="p3"]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments