誰か助けてください。運が悪かったので1週間以上取り組んできました!下の写真のように「心臓血管」という単語が含まれている場合は、複数のプラスボタンをクリックして開き、内容を印刷したいと思います。
これが私が持っているコードです:
from selenium import webdriver
from selenium import webdriver
chrome_path=r"G:\My Drive\chrome_driver\chromedriver_win32\chromedriver.exe"
driver=webdriver.Chrome(chrome_path)
driver.get('https://meshb.nlm.nih.gov/treeView')
sidebar = driver.find_element_by_xpath("/html/body/div[2]/div")
i=1
for i in range(16): # since I have 16 div(s)
sidebar.find_elements_by_xpath("/html/body/div[2]/div/div[i]")
element = driver.find_element_by_xpath("/html/body/div[2]/div").find_element_by_xpath("/html/body/div[2]/div/div[i]").find_element_by_xpath("//*[@class='ng-scope']/span")
element.click()
しかし、私はこのエラーを受け取り続けます:
no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[2]/div/div[i]"}
また、htmlページのスクリーンショットを2枚掲載しました。1つは、すべてのdivを表示し、もう1つは、展開されたdivの1つを表示します。どんな助けでも大歓迎です!
コードの説明:
i = 0
while True:
# locate all elements
elements = driver.find_elements_by_xpath("//span[@ng-if = 'node.HasChildren']/i[@ng-click='getTreeChildren(node)']")
if len(elements) > i:
elements[i].click() # click on the i-th element in the list
i += 1 # increment i
time.sleep(0.5) # wait until list will be updated
continue
break
you create a infinte loop, every time locate all elements to expand. Take the i-th element and click to expand. Wait until the dropdown will download(you can set another value in wait
). Then you will execute continue
statement to start loop
from beginning. And this will be executed until the list size
of located elements is bigger then i
. Then you will reach break
statement to break
the loop
. After this you can scrap the data.
Now you have all data visible on the page and you can locate all elements you need. I assume you want the spans(like <span ng-if="!node.strong" class="ng-binding ng-scope">Blood-Air Barrier [A07.025]</span>
) text:
# spans is the list of all descendants (children, grandchildren, etc.) of the current node and the current node itself
spans = driver.find_elements_by_xpath("//span[contains(., 'Cardiovascular')]/parent::*/parent::*/descendant-or-self::node()/a/span")
if you don't want the node self in the list, you can do like this:
# spans is the list of all descendants (children, grandchildren, etc.) of the current node without current node itself
spans = driver.find_elements_by_xpath("//span[contains(., 'Cardiovascular')]/parent::*/parent::*/descendant::node()/a/span")
And finally you can for example prin the text of all elements like this:
for span in spans
print span.text
The template:
from selenium import webdriver
import time
chrome_path = r"C:\Users\Andrei\Desktop\driver\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://meshb.nlm.nih.gov/treeView')
driver.implicitly_wait(5) # wait until page will be loaded
i = 0
while True:
# locate all elements
elements = driver.find_elements_by_xpath("//span[@ng-if = 'node.HasChildren']/i[@ng-click='getTreeChildren(node)']")
if len(elements) > i:
elements[i].click() # click on the i-th element in the list
i += 1 # increment i
time.sleep(0.5) # wait until list will be updated
continue
break
spans = driver.find_elements_by_xpath("//span[contains(., 'Cardiovascular')]/parent::*/parent::*/descendant-or-self::node()/a/span")
for span in spans:
print(span.text)
Output:
Blood-Air Barrier [A07.025]
Blood-Aqueous Barrier [A07.030]
Blood-Brain Barrier [A07.035]
...
EDIT:
for quick check you can use this template (it will expand only few nodes, just for test propose):
from selenium import webdriver
import time
chrome_path = r"C:\Users\Andrei\Desktop\driver\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://meshb.nlm.nih.gov/treeView')
driver.implicitly_wait(5) # wait until page will be loaded
i = 0
while i < 9:
# locate all elements
elements = driver.find_elements_by_xpath("//span[@ng-if = 'node.HasChildren']/i[@ng-click='getTreeChildren(node)']")
if len(elements) > i:
if i == 0:
elements[i].click()
i += 6
elements[i].click() # click on the i-th element in the list
i += 1 # increment i
time.sleep(0.5) # wait until list will be updated
continue
break
spans = driver.find_elements_by_xpath("//span[contains(., 'Cardiovascular')]/parent::*/parent::*/descendant-or-self::node()/a/span")
for span in spans:
print(span.text)
Output:
Cardiovascular System [A07]
Blood-Air Barrier [A07.025]
Blood-Aqueous Barrier [A07.030]
Blood-Brain Barrier [A07.035]
Blood-Nerve Barrier [A07.037]
Blood-Retinal Barrier [A07.040]
Blood-Testis Barrier [A07.045]
Blood Vessels [A07.231]
Adventitia [A07.231.057]
Arteries [A07.231.114]
Microvessels [A07.231.461]
Retinal Vessels [A07.231.611]
Tunica Intima [A07.231.700]
Tunica Media [A07.231.733]
Vasa Nervorum [A07.231.765]
Vasa Vasorum [A07.231.836]
Veins [A07.231.908]
Glomerular Filtration Barrier [A07.500]
Heart [A07.541]
More information about xPath
axes here
ADD:
Since the node list on the website is very big(I din't know it), I have added a light version of the code above. Here is almost the same logic as already was. The difference is in following: firstly will be expanded the 16 main nodes, then will be located the node, which we searching and then will be expanded all only its children. It is much more quicker to get the result, but if search node is not on the "first" level, then will be nothing found. It is possible to go "deeper" and search on second, third etc. levels, but it will be complicated to implement. At least the logic how to deal with this problem I think is clear. PS the code above is also workable, but it slows down, when many nodes are presented, so it takes more time in time.sleep()
.
注:検索文字列でノードが1つだけ取得されるように、完全な単語を指定する必要があります。たとえば、次のCardiovascular
2つのノードがあります:Cardiovascular System [A07]
とCardiovascular Diseases [C14]
。また、プログラムはのすべてのノードを展開するわけではありませんCardiovascular Diseases [C14]
。2番目のノードも拡張する場合は、以下のコードを少し変更する必要があります。以下のためにCardiovascular System
一つのノードのみとなります。
from selenium import webdriver
import time
chrome_path = r"C:\Users\Andrei\Desktop\driver\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://meshb.nlm.nih.gov/treeView')
driver.implicitly_wait(1) # wait until page will be loaded
search_word = "Cardiovascular System"
elements_xpath = "//span[@ng-if = 'node.HasChildren']/i[@ng-click='getTreeChildren(node)']"
spans_xpath = "//span[contains(., '" + search_word + "')]/parent::*/parent::*/descendant-or-self::node()/a/span"
link_xpath = "//span[@class = 'ng-binding ng-scope']"
# expand all nodes first level
elements = driver.find_elements_by_xpath(elements_xpath)
for element in elements:
element.click()
time.sleep(0.3)
# search for span position
elements = driver.find_elements_by_xpath(link_xpath)
i = 0
for element in elements:
if search_word in element.text:
break
i += 1
# i is the position where the 'Cardiovascular' was found
# now is time to expand all child nodes at i position
end = i + 1
elements = driver.find_elements_by_xpath("//span[@class = 'ng-scope']")
old_length = len(elements)
while i < end:
elements[i].click()
i += 1
time.sleep(0.3)
elements = driver.find_elements_by_xpath("//span[@class = 'ng-scope']")
end = end + len(elements) - old_length
old_length = len(elements)
# all child nodes are expanded
# time to collect information
spans = driver.find_elements_by_xpath(spans_xpath)
for span in spans:
print(span.text)
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加