SeleniumとPythonを使用して、ページ上のすべてのプラスボタンを開きます

debugcn 投稿 Dev

nina_dev

誰か助けてください。運が悪かったので1週間以上取り組んできました！下の写真のように「心臓血管」という単語が含まれている場合は、複数のプラスボタンをクリックして開き、内容を印刷したいと思います。

これが私が持っているコードです：

from selenium import webdriver
from selenium import webdriver
chrome_path=r"G:\My Drive\chrome_driver\chromedriver_win32\chromedriver.exe"
driver=webdriver.Chrome(chrome_path)
driver.get('https://meshb.nlm.nih.gov/treeView')
sidebar = driver.find_element_by_xpath("/html/body/div[2]/div")
i=1
for i in range(16):  # since I have 16 div(s)
   sidebar.find_elements_by_xpath("/html/body/div[2]/div/div[i]")       
   element = driver.find_element_by_xpath("/html/body/div[2]/div").find_element_by_xpath("/html/body/div[2]/div/div[i]").find_element_by_xpath("//*[@class='ng-scope']/span")
   element.click()

しかし、私はこのエラーを受け取り続けます：

no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[2]/div/div[i]"}

また、htmlページのスクリーンショットを2枚掲載しました。1つは、すべてのdivを表示し、もう1つは、展開されたdivの1つを表示します。どんな助けでも大歓迎です！

アンドレイ・スヴォルコフ

コードの説明：

i = 0
while True:
  # locate all elements
  elements = driver.find_elements_by_xpath("//span[@ng-if = 'node.HasChildren']/i[@ng-click='getTreeChildren(node)']")
  if len(elements) > i:
      elements[i].click() # click on the i-th element in the list
      i += 1 # increment i
      time.sleep(0.5) # wait until list will be updated
      continue
  break

you create a infinte loop, every time locate all elements to expand. Take the i-th element and click to expand. Wait until the dropdown will download(you can set another value in wait). Then you will execute continue statement to start loop from beginning. And this will be executed until the list size of located elements is bigger then i. Then you will reach break statement to break the loop. After this you can scrap the data.

Now you have all data visible on the page and you can locate all elements you need. I assume you want the spans(like <span ng-if="!node.strong" class="ng-binding ng-scope">Blood-Air Barrier [A07.025]</span>) text:

# spans is the list of all descendants (children, grandchildren, etc.) of the current node and the current node itself
spans = driver.find_elements_by_xpath("//span[contains(., 'Cardiovascular')]/parent::*/parent::*/descendant-or-self::node()/a/span")

if you don't want the node self in the list, you can do like this:

# spans is the list of all descendants (children, grandchildren, etc.) of the current node without current node itself
spans = driver.find_elements_by_xpath("//span[contains(., 'Cardiovascular')]/parent::*/parent::*/descendant::node()/a/span")

And finally you can for example prin the text of all elements like this:

for span in spans
  print span.text

The template:

from selenium import webdriver
import time

chrome_path = r"C:\Users\Andrei\Desktop\driver\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://meshb.nlm.nih.gov/treeView')
driver.implicitly_wait(5) # wait until page will be loaded

i = 0
while True:
  # locate all elements
  elements = driver.find_elements_by_xpath("//span[@ng-if = 'node.HasChildren']/i[@ng-click='getTreeChildren(node)']")
  if len(elements) > i:
      elements[i].click() # click on the i-th element in the list
      i += 1 # increment i
      time.sleep(0.5) # wait until list will be updated
      continue
  break


spans = driver.find_elements_by_xpath("//span[contains(., 'Cardiovascular')]/parent::*/parent::*/descendant-or-self::node()/a/span")
for span in spans:
    print(span.text)

Output:

Blood-Air Barrier [A07.025]
Blood-Aqueous Barrier [A07.030]
Blood-Brain Barrier [A07.035]
...

EDIT:

for quick check you can use this template (it will expand only few nodes, just for test propose):

from selenium import webdriver
import time

chrome_path = r"C:\Users\Andrei\Desktop\driver\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://meshb.nlm.nih.gov/treeView')
driver.implicitly_wait(5) # wait until page will be loaded

i = 0
while i < 9:
  # locate all elements
  elements = driver.find_elements_by_xpath("//span[@ng-if = 'node.HasChildren']/i[@ng-click='getTreeChildren(node)']")
  if len(elements) > i:
      if i == 0:
          elements[i].click()
          i += 6
      elements[i].click() # click on the i-th element in the list
      i += 1 # increment i
      time.sleep(0.5) # wait until list will be updated
      continue
  break


spans = driver.find_elements_by_xpath("//span[contains(., 'Cardiovascular')]/parent::*/parent::*/descendant-or-self::node()/a/span")
for span in spans:
    print(span.text)

Output:

Cardiovascular System [A07]
Blood-Air Barrier [A07.025]
Blood-Aqueous Barrier [A07.030]
Blood-Brain Barrier [A07.035]
Blood-Nerve Barrier [A07.037]
Blood-Retinal Barrier [A07.040]
Blood-Testis Barrier [A07.045]
Blood Vessels [A07.231]
Adventitia [A07.231.057]
Arteries [A07.231.114]
Microvessels [A07.231.461]
Retinal Vessels [A07.231.611]
Tunica Intima [A07.231.700]
Tunica Media [A07.231.733]
Vasa Nervorum [A07.231.765]
Vasa Vasorum [A07.231.836]
Veins [A07.231.908]
Glomerular Filtration Barrier [A07.500]
Heart [A07.541]

More information about xPath axes here

ADD:

Since the node list on the website is very big(I din't know it), I have added a light version of the code above. Here is almost the same logic as already was. The difference is in following: firstly will be expanded the 16 main nodes, then will be located the node, which we searching and then will be expanded all only its children. It is much more quicker to get the result, but if search node is not on the "first" level, then will be nothing found. It is possible to go "deeper" and search on second, third etc. levels, but it will be complicated to implement. At least the logic how to deal with this problem I think is clear. PS the code above is also workable, but it slows down, when many nodes are presented, so it takes more time in time.sleep().

注：検索文字列でノードが1つだけ取得されるように、完全な単語を指定する必要があります。たとえば、次のCardiovascular2つのノードがあります：Cardiovascular System [A07]とCardiovascular Diseases [C14]。また、プログラムはのすべてのノードを展開するわけではありませんCardiovascular Diseases [C14]。2番目のノードも拡張する場合は、以下のコードを少し変更する必要があります。以下のためにCardiovascular System一つのノードのみとなります。

from selenium import webdriver
import time

chrome_path = r"C:\Users\Andrei\Desktop\driver\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://meshb.nlm.nih.gov/treeView')
driver.implicitly_wait(1) # wait until page will be loaded

search_word = "Cardiovascular System"
elements_xpath = "//span[@ng-if = 'node.HasChildren']/i[@ng-click='getTreeChildren(node)']"
spans_xpath = "//span[contains(., '" + search_word + "')]/parent::*/parent::*/descendant-or-self::node()/a/span"
link_xpath = "//span[@class = 'ng-binding ng-scope']"


# expand all nodes first level
elements = driver.find_elements_by_xpath(elements_xpath)
for element in elements:
    element.click()
    time.sleep(0.3)

# search for span position
elements = driver.find_elements_by_xpath(link_xpath)
i = 0
for element in elements:
    if search_word in element.text:
        break
    i += 1

# i is the position where the 'Cardiovascular' was found
# now is time to expand all child nodes at i position
end = i + 1
elements = driver.find_elements_by_xpath("//span[@class = 'ng-scope']")
old_length = len(elements)

while i < end:
    elements[i].click()
    i += 1
    time.sleep(0.3)
    elements = driver.find_elements_by_xpath("//span[@class = 'ng-scope']")
    end = end + len(elements) - old_length
    old_length = len(elements)

# all child nodes are expanded
# time to collect information
spans = driver.find_elements_by_xpath(spans_xpath)
for span in spans:
    print(span.text)

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-06-1

コメントを追加

サインイン

分類Dev

Related 関連記事

記事

SeleniumとPythonを使用して、ページ上のすべてのプラスボタンを開きます

SeleniumとPythonを使用して、ページ上のすべてのプラスボタンを開きます

PythonでSeleniumを使用してWebページのすべてのボタンをクリックする

ページ上のすべてのリンクを書き直して、JavaScriptを使用して新しいタブで開きます

静的ページプラグイン-カスタムページフィールドとしてすべてのページのURLをリストします

Chromeを使用してWebページをスタンドアロンの「アプリケーション」として開きますか？

slideToggleページ上のすべてのリンクを開きますか？

ボタンを使用して、在庫アイテムの詳細を含むページを開きます

ボタンをクリックすると、ページ上のすべてのリンクがcssのlinkStylesクラスに変更されます。

すべてのブートストラップアコーディオンを閉じるように明示的にコーディングされたラジオボタンをクリックすると、代わりにすべてが開きます

Aページのボタンをクリックして新しいページBを開き、ページBを閉じると、ページAは更新せずにBからパラメーターを取得します。それは可能ですか？

<ボタン>を使用して、ページのすべての形式を操作します

ListView の上にボタンを配置すると、ページを超えて展開されます。

Datatablesのすべてのページでボタンを無効にしようとしています

DjangoとPythonを使用してボタン付きのブートストラップクラスを追加することはできません

Python / Seleniumを使用してWebページのコンテンツをスクレイプします

ページを開くときにすべてのDivが表示されます（jqueryにも関わらず、ラジオボタンのクリックでdivを表示します）

ハードウェアの戻るボタンを使用して特定のページからIONIC3を使用してアプリ開発を終了します

Seleniumを使用してget_elements_by_xpathでループ内の各ボタンクリック後にページソースを取得します

すべて同じCSSクラスがある場合に、ページ上のすべてのボタンを次々にクリックする方法（開発コンソールスクリプト）

Selenium + Pythonを使用して、リンクをループし、結果のページからデータをスクレイピングします

Seleniumは、LINK_TEXTメソッドを使用してページ上のログインボタンを見つけることができません

Seleniumとpythonを使用してWebページ上のテキストを検索し、そのすぐ下のテキストを取得します

AJAXを使用して、カスタムページのボタンからすべてのカートアイテムを削除します

jQueryを使用して、ページの読み込み時に特定のブートストラップタブを開きます

Ag-grid-エンタープライズはボタンを使用してすべての行を展開/折りたたみますか？FFとエッジのクラッシュが非常に遅い

ページjqueryをロードするときにすべてのdivをスライドします

サードパーティのスクリプトボタンを使用して、ページ上の他のボタンをクリックするとトリガーされます

PythonとSeleniumを使用してドライバースクレーパーを作成しようとしています。Webページから特定のデータを取得して、csvの行と列に配置する必要があります。

Javascriptの「onclick」ボタンを克服してSeleniumでWebページをスクレイプする方法

Android Studioのすべてのボタンを無効にして、プログラムを再開します