我制作了这个脚本来检查 github 配置文件是否包含电子邮件并将它们排序为 2 个单独的列表,它还没有完成需要优化和摆脱额外的东西。
主要问题是我使用 python 在终端上运行,而我的系统是 2.7,当我想用 python 3 运行它时,出现错误,我尝试调试它们但仍然无法解决问题。
这是我的代码:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from timeit import default_timer as timer
start = timer()
timeout = 1
haveEmail = []
noEmail = []
options = webdriver.ChromeOptions()
options.add_argument("headless")
#YOU NEED TO PROVIDE THE LOCATION OF YOU CHROME DRIVER
browser = webdriver.Chrome('/home/djurovic/Desktop/Linux ChromeDriver/chromedriver', chrome_options=options)
def login(email):
browser.get(email)
signInXpath = '//a[@class="HeaderMenu-link no-underline mr-3"]'
signInElement = WebDriverWait(browser, timeout).until(lambda browser: browser.find_element_by_xpath(signInXpath))
signInElement.click()
#ENTER YOUR LOGIN INFORMATION
username = ''
password = ''
userNameXpath = '//input[@class="form-control input-block"]'
passwordXpath = '//input[@class="form-control form-control input-block"]'
loginButtonXpath = '//input[@class="btn btn-primary btn-block"]'
userNameElement = WebDriverWait(browser, timeout).until(lambda browser: browser.find_element_by_xpath(userNameXpath))
passwordElement = WebDriverWait(browser, timeout).until(lambda browser: browser.find_element_by_xpath(passwordXpath))
userNameElement.clear()
userNameElement.send_keys(username)
passwordElement.clear()
passwordElement.send_keys(password)
loginButtonElement = WebDriverWait(browser, timeout).until(lambda browser: browser.find_element_by_xpath(loginButtonXpath))
loginButtonElement.click()
def checkEmail(profile):
browser.get(profile)
emailXPath = '//a[@class="u-email"]'
try:
emailElement = WebDriverWait(browser, timeout).until(lambda browser: browser.find_element_by_xpath(emailXPath))
haveEmail.append(profile)
except:
noEmail.append(profile)
def parse():
i = 1
document = open('profiles.txt', 'rb')
for profile in document:
if i == 1:
login(profile)
i = i + 1
checkEmail(profile)
browser.close()
parse()
print(haveEmail)
print(noEmail)
for item in haveEmail:
output_file = open("profiles_with_email.txt", 'a')
for i in str(len(haveEmail)):
output_file.write(item + "\n")
output_file.close()
for item in noEmail:
output_file = open("profiles_with_no_email.txt", 'a')
for i in str(len(noEmail)):
output_file.write(item + "\n")
output_file.close()
elapsed_time = timer() - start
print("Script finished in " + str(elapsed_time))
以下是我在通过 python 3 运行时遇到的错误:
Traceback (most recent call last):
File "github.py", line 77, in <module>
parse()
File "github.py", line 72, in parse
login(profile)
File "github.py", line 33, in login
browser.get(email)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 319, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/remote_connection.py", line 372, in execute
data = utils.dump_json(params)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/utils.py", line 33, in dump_json
return json.dumps(json_struct)
File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.6/json/encoder.py", line 180, in default
o.__class__.__name__)
TypeError: Object of type 'bytes' is not JSON serializable
任何形式的帮助都会有很大帮助!!!
感谢@ZeroQ 和@Divyanshu Srivastava
我试过@Divyanshu Srivastava,但该函数不能接受额外的参数,所以我刚刚在我的 parse() 函数中添加了这一行:
profile = profile.decode('utf-8')
它是一种魅力。再次感谢。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句