我已经用 python 编写了一个脚本来解析我的 Twitter 个人资料页面中查看所有部分中可用的名称、推文、关注者和关注者。它目前正在做它的工作。但是,我发现这个刮板有两个问题:
这是我写的:
from selenium import webdriver
import time
def twitter_data():
driver = webdriver.Chrome()
driver.get('https://twitter.com/?lang=en')
driver.find_element_by_xpath('//input[@id="signin-email"]').send_keys('username')
driver.find_element_by_xpath('//input[@id="signin-password"]').send_keys('password')
driver.find_element_by_xpath('//button[@type="submit"]').click()
driver.implicitly_wait(15)
#Clicking the viewall link
driver.find_element_by_xpath("//small[@class='view-all']//a[contains(@class,'js-view-all-link')]").click()
time.sleep(10)
for links in driver.find_elements_by_xpath("//div[@class='stream-item-header']//a[contains(@class,'js-user-profile-link')]"):
processing_files(links.get_attribute("href"))
#going on to the each profile falling under viewall section
def processing_files(item_link):
driver = webdriver.Chrome()
driver.get(item_link)
# getting information of each profile holder
for prof in driver.find_elements_by_xpath("//div[@class='route-profile']"):
name = prof.find_elements_by_xpath(".//h1[@class='ProfileHeaderCard-name']//a[contains(@class,'ProfileHeaderCard-nameLink')]")[0]
tweet = prof.find_elements_by_xpath(".//span[@class='ProfileNav-value']")[0]
following = prof.find_elements_by_xpath(".//span[@class='ProfileNav-value']")[1]
follower = prof.find_elements_by_xpath(".//span[@class='ProfileNav-value']")[2]
print(name.text, tweet.text, following.text, follower.text)
twitter_data()
当我发现有必要让机器人等待更长时间时,我在我的刮刀原因中同时使用了implicitly_wait和time.sleep,我使用了后者。提前感谢您查看它。
您可以使用 driver.quit() 关闭页面,如下所示。这将减少任务栏中的页面。
from selenium import webdriver
import time
def twitter_data():
driver = webdriver.Chrome()
driver.get('https://twitter.com/?lang=en')
driver.find_element_by_xpath('//input[@id="signin-email"]').send_keys('username')
driver.find_element_by_xpath('//input[@id="signin-password"]').send_keys('password')
driver.find_element_by_xpath('//button[@type="submit"]').click()
driver.implicitly_wait(15)
#Clicking the viewall link
driver.find_element_by_xpath("//small[@class='view-all']//a[contains(@class,'js-view-all-link')]").click()
time.sleep(10)
for links in driver.find_elements_by_xpath("//div[@class='stream-item-header']//a[contains(@class,'js-user-profile-link')]"):
processing_files(links.get_attribute("href"))
driver.quit()
#going on to the each profile falling under viewall section
def processing_files(item_link):
driver1 = webdriver.Chrome()
driver1.get(item_link)
# getting information of each profile holder
for prof in driver1.find_elements_by_xpath("//div[@class='route-profile']"):
name = prof.find_elements_by_xpath(".//h1[@class='ProfileHeaderCard-name']//a[contains(@class,'ProfileHeaderCard-nameLink')]")[0]
tweet = prof.find_elements_by_xpath(".//span[@class='ProfileNav-value']")[0]
following = prof.find_elements_by_xpath(".//span[@class='ProfileNav-value']")[1]
follower = prof.find_elements_by_xpath(".//span[@class='ProfileNav-value']")[2]
print(name.text, tweet.text, following.text, follower.text)
driver1.quit ()
twitter_data()
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句