我编写了一个代码,使用Selenium Webdriver通过URL列表下载文件,但是由于某种原因,它没有下载任何文件到我的assignedn目录中。当我仅一次下载一个代码时,该代码运行良好,但是当我使用for循环时,它不起作用。
这是一个示例URL:https : //www.regulations.gov/contentStreamer?documentId=WHD-2020-0007-1730&attachmentNumber=1&contentType=pdf
这是我的代码:
download_dir = '/Users/datawizard/files/'
for web in down_link:
try:
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_experimental_option("prefs", {
"download.default_directory": '/Users/clinton/GRA_2021/scraping_project/pdf/',
"download.prompt_for_download": False,
"download.directory_upgrade": True,
# "safebrowsing.enabled": True,
"plugins.always_open_pdf_externally": True
})
driver = webdriver.Chrome(chrome_options=options)
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
command_result = driver.execute("send_command", params)
driver.get(url)
except:
print(str(web)+"Link cannot be open")
我想知道我的代码是否做错了什么,因为当我运行上面的代码时它没有给我任何错误。
您不需要Selenium来下载文件,可以使用该request
库轻松下载文件
import requests
for web in down_link:
fileName = YOUR_DOWNLOAD_PATH + web.split("=")[1].split("&")[0] + ".pdf" #I created a filename
r = requests.get(web, stream=True)
with open(fileName, 'wb') as f:
for chunk in r.iter_content():
f.write(chunk)
根据硒更新了答案
#replace the below value with your urls list
down_link = [
'https://www.regulations.gov/contentStreamer?documentId=WHD-2020-0007-1730&attachmentNumber=1&contentType=pdf',
'https://www.regulations.gov/contentStreamer?documentId=WHD-2020-0007-1730&attachmentNumber=1&contentType=pdf']
download_dir = "/Users/datawizard/files/"
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_experimental_option("prefs", {
"download.default_directory": download_dir,
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True
})
driver = webdriver.Chrome(chrome_options=options)
for web in down_link:
driver.get(web)
time.sleep(5) #wait for the download to end, a better handling it's to check if the file exists
driver.quit()
如果您的文件没有唯一的文件名-上面的代码将用下载的文件替换现有文件。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句