通过登录网站进行python网络抓取

debugcn 发表于 Dev

pythondazza

寻找一些帮助刮刮需要登录的网站的帮助。从本质上讲，该网站是要获取交易卡价格（我认为该价格来自ebay），但其格式允许在ebays网站上搜索90天以上。登录URL是https://members.pwccmarketplace.com/login我搜索的URL是https://members.pwccmarketplace.com/我搜索了以前的帖子，发现我认为可以尝试复制但没有成功。以下是代码，无论它是否有效，任何帮助将不胜感激。

#https://stackoverflow.com/questions/47438699/scraping-a-website-with-python-3-that-requires-login
import requests
from lxml import html
from bs4 import BeautifulSoup
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
from datetime import datetime
from datetime import date
import pandas as pd
import numpy as np
from time import sleep
from random import randint
from urllib.parse import quote

Product_name = []
Price = []
Date_sold = []

url = "https://www.pwccmarketplace.com/login"
values = {"email": "[email protected]",
          "password": "password"}

session = requests.Session()

r = session.post(url, data=values)

Search_name = input("Search for: ")
Exclude_terms = input("Exclude these terms (- infront of all, no spaces): ")
qstr = quote(Search_name)
qstrr = quote(Exclude_terms)
Number_pages = int(input("Number of pages you want searched (Number -1): "))

pages = np.arange(1, Number_pages)

for page in pages:

    params = {"Category": 6, "deltreeid": 6, "do": "Delete Tree"}
    url = "https://www.pwccmarketplace.com/market-price-research?q=" + qstr + "+" + qstrr + "&year_min=2004&year_max=2020&price_min=0&price_max=10000&sort_by=date_desc&sale_type=auction&items_per_page=250&page=" + str(page)

    result = session.get(url, data=params)

    soup = BeautifulSoup(result.text, "lxml")

    search = soup.find_all('tr')

    sleep(randint(2,10))

    for container in search:

代码继续，但与此问题无关。

贝特朗·马特尔

执行时，有效负载中会发送一个令牌POST https://members.pwccmarketplace.com/login。该令牌位于input标签中，可以使用beautifulsoup进行刮取：

import requests
from bs4 import BeautifulSoup

session = requests.Session()

email = "[email protected]"
password = "your_password"

r = session.get("https://members.pwccmarketplace.com/login")

soup = BeautifulSoup(r.text, "html.parser")
token = soup.find("input", { "name": "_token"})["value"]

r = session.post(
    "https://members.pwccmarketplace.com/login",
    data = {
        "_token": token,
        "redirect": "",
        "email": email,
        "password": password,
        "remember": "true"
    }
)

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-2

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

通过登录网站进行python网络抓取

通过登录网站进行python网络抓取

如何抓取需要首先使用Python登录的网站

如何在Python中插入Cookie以进行网络抓取？

在python中进行网络抓取：BS，Selenium和None错误

如何在R中没有网站无效的情况下进行网络抓取？

如何使用python“网络抓取”包含弹出窗口的网站？

在Python标签中使用BeautifulSoup进行网络抓取

通过多个网站抓取

使用python和BeautifulSoup进行网络抓取

如何通过网络使用Python抓取图表？

使用BeautifulSoup使用python进行网络抓取，发现错误

使用python从.aspx网站进行网页抓取

如何抓取使用BankID进行Python登录的网站（BeautifulSoap，请求）？

我刚刚开始学习使用硒的网络抓取。我需要登录本地网站并双击某些项目

Python BeautifulSoup网络抓取

如何使用python提取/下载并在网站源代码中找到的doc.google.com/spreadsheet链接进行网络抓取？

通过套接字进行Python网络编程

使用登录信息通过python抓取网站

通过VBA登录网站

通过网站进行网络爬虫解析

聚合网站的网络抓取价格

使用python beautifulsoup进行网络抓取，等号后获取值

Python 3.5：通过剥离html代码进行网络抓取

屏幕抓取iTunes Connect：通过登录页面进行操作

使用Python进行网页抓取（容器跟踪网站）

简单的python网络抓取

使用python 2.7和beautifulsoup 4进行网站抓取

登录到站点以使用 Python 进行网络抓取

无法通过 RoboBrowser Python 登录网站

Python 网页抓取登录