On this page I would like Selenium for Python to grab the text contents of the "Investment Objective", excluding the <h3>
header. I want to use XPath.
The nodes look like this:
<div class="carousel-content column fund-objective">
<h3 class="carousel-header">INVESTMENT OBJECTIVE</h3>
The Fund seeks to track the performance of an index composed of 25 of the largest Dutch companies listed on NYSE Euronext Amsterdam.
</div>
To retrieve the text, I'm using:
string = driver.find_element_by_xpath(xpath).text
If use I the this XPath for the top node:
xpath = '//div[@class="carousel-content column fund-objective"]'
It will work, but it includes the <h3>
header INVESTMENT OBJECTIVE
— which I want to exclude.
However, if I try to use /text()
to address the actual text content, it seems that Selenium for Python doesn't let me grab it whilst using the .text
to get the attribute:
xpath = '//div[@class="carousel-content column fund-objective"]/text()'
Note that there seems to be multiple nodes with this XPath on this particular page, so I'm specifying the correct node like this:
xpath = '(//div[@class="carousel-content column fund-objective"]/text())[2]'
My interpretation of the problem is that .text
doesn't allow me to retrieve the text contents of the XPath sub-node text()
. My apologies for incorrect terminology.
/text()
will locate and return text node, which is not an element node. It doesn't have text
property.
One solution will be to locate both elements and remove the unwanted text
xpath = '//div[@class="carousel-content column fund-objective"]'
element = driver.find_element_by_xpath(xpath)
all_text = element .text
title_text = element.find_element_by_xpath('./*[@class="carousel-header"]').text
all_text.replace(title_text, '')
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加