这是我的代码,用于检查多个URL以查找特定的关键字,并在找到或未找到该关键字的情况下将其写入输出文件。
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
res = requests.get(url_1)
finalresult= print(keyword in res.text)
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
但是,一旦我的多个URL中的任何一个都关闭并且出现HTTP错误,脚本便会停止并显示以下错误:
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='argos-yoga.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x122582d90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
如何忽略此类错误并让脚本继续扫描?有人可以帮我吗?谢谢
尝试try..except
只放requests.get()
和res.text
。
例如:
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
try: # <-- put try..except here
res = requests.get(url_1)
finalresult = keyword in res.text # <-- remove print()
except:
finalresult = False
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
编辑:Down
出现错误时放入列表:
for url in urls:
url_1 = url
keyword ='myKeyword'
try: # <-- put try..except here
res = requests.get(url_1)
if keyword in res.text:
myList.append("OK")
else:
myList.append("NOT OK")
except:
myList.append("Down")
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句