这个问题已经得到解答,最简单的方法之一就是在元素内获取标签名称(如果已知)
child_elements = element.find_elements_by_tag_name("<tag name>")
但是,对于粘贴的以下元素,在标记名称的25个实例中仅返回9个。我是JavaScript的新手,因此我无法将原因归零。在此示例中,我试图dt
在ol
元素内获取标签。我正在使用的代码段是
par_element = browser.find_element_by_class_name('search-results__result-list')
child_elements = par_element.find_elements_by_tag_name("dt")
下图显示了来自页面源的元素骨架/结构:(所有div
标签的结构都相同,例如,将其展开以显示。
I have also tried getting the class name result-lockup__name directly, and it still returns only 9 out of the 25 instances. What could be the reason?
EDIT
Initially,all the elements were not loaded, and thus I had to scroll through the page by
browser.execute_script('window.scrollTo(0,document.body.scrollHeight)')
When the problem occurred once again, and I was not able to figure out, I raised this question. Apparently, it looks like even the scroll is not helping, as certain elements look hidden
After manually scrolling through them again, keeping the code in pause, I was able to "enable" them.
Is this a type of mask to save sites from being scrapped? I feel now that I would probably have to scroll up in increments to reveal them all, but is there a smarter way?
这些元素是动态加载的,您需要缓慢滚动页面以获取所有子元素。请尝试以下代码,希望它能正常工作。
element_list=[]
while True:
browser.find_element_by_tag_name("body").send_keys(Keys.DOWN)
time.sleep(2)
listlen_before=len(element_list)
par_element = browser.find_element_by_class_name('search-results__result-list')
child_elements = par_element.find_elements_by_tag_name("dt")
for ele in child_elements:
if ele.text in element_list:
continue
else:
element_list.append(ele.text)
listlen_after = len(element_list)
if listlen_before==listlen_after:
break
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句