寻找一些帮助刮刮需要登录的网站的帮助。从本质上讲,该网站是要获取交易卡价格(我认为该价格来自ebay),但其格式允许在ebays网站上搜索90天以上。登录URL是https://members.pwccmarketplace.com/login我搜索的URL是https://members.pwccmarketplace.com/我搜索了以前的帖子,发现我认为可以尝试复制但没有成功。以下是代码,无论它是否有效,任何帮助将不胜感激。
#https://stackoverflow.com/questions/47438699/scraping-a-website-with-python-3-that-requires-login
import requests
from lxml import html
from bs4 import BeautifulSoup
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
from datetime import datetime
from datetime import date
import pandas as pd
import numpy as np
from time import sleep
from random import randint
from urllib.parse import quote
Product_name = []
Price = []
Date_sold = []
url = "https://www.pwccmarketplace.com/login"
values = {"email": "[email protected]",
"password": "password"}
session = requests.Session()
r = session.post(url, data=values)
Search_name = input("Search for: ")
Exclude_terms = input("Exclude these terms (- infront of all, no spaces): ")
qstr = quote(Search_name)
qstrr = quote(Exclude_terms)
Number_pages = int(input("Number of pages you want searched (Number -1): "))
pages = np.arange(1, Number_pages)
for page in pages:
params = {"Category": 6, "deltreeid": 6, "do": "Delete Tree"}
url = "https://www.pwccmarketplace.com/market-price-research?q=" + qstr + "+" + qstrr + "&year_min=2004&year_max=2020&price_min=0&price_max=10000&sort_by=date_desc&sale_type=auction&items_per_page=250&page=" + str(page)
result = session.get(url, data=params)
soup = BeautifulSoup(result.text, "lxml")
search = soup.find_all('tr')
sleep(randint(2,10))
for container in search:
代码继续,但与此问题无关。
执行时,有效负载中会发送一个令牌POST https://members.pwccmarketplace.com/login
。该令牌位于input
标签中,可以使用beautifulsoup进行刮取:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
email = "[email protected]"
password = "your_password"
r = session.get("https://members.pwccmarketplace.com/login")
soup = BeautifulSoup(r.text, "html.parser")
token = soup.find("input", { "name": "_token"})["value"]
r = session.post(
"https://members.pwccmarketplace.com/login",
data = {
"_token": token,
"redirect": "",
"email": email,
"password": password,
"remember": "true"
}
)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句