BeautifulSoup 获取其中的链接和信息

debugcn 发表于 Dev

埃巴诺兹

我想抓取一个网站。网站每页有10个投诉预览。我写了这个脚本来获取 10 个投诉的链接和每个链接中的一些信息。当我运行脚本时，我收到此错误消息“RecursionError：超出最大递归深度”。有人可以告诉我有什么问题吗。先感谢您！！

from requests import get
from bs4 import BeautifulSoup
import pandas as pd

# Create list objects for each information section
C_date = []
C_title = []
C_text = []
U_name = []
U_id = []
C_count = []
R_name = []
R_date = []
R_text = []

# Get 10 links for preview of complaints
def getLinks(url):
    response = get(url)
    html_soup = BeautifulSoup(response.text, 'html.parser')
    c_containers = html_soup.find_all('div', class_='media')
    # Store wanted links in a list
    allLinks = []

    for link in c_containers:
        find_tag = link.find('a')
        find_links = find_tag.get('href')
        full_link = "".join((url, find_links))
        allLinks.append(full_link)
    # Get total number of links
    print(len(allLinks))
    return allLinks

def GetData(Each_Link):
    each_complaint_page = get(Each_Link)
    html_soup = BeautifulSoup(each_complaint_page.text, 'html.parser')
    # Get date of complaint
    dt = html_soup.main.find('span')
    date = dt['title']
    C_date.append(date)
    # Get Title of complaint
    TL = html_soup.main.find('h1', {'class': 'title'})
    Title = TL.text
    C_title.append(Title)
    # Get main text of complaint
    Tx = html_soup.main.find('div', {'class': 'description'})
    Text = Tx.text
    C_text.append(Text)
    # Get user name and id
    Uname = html_soup.main.find('span', {'class': 'user'})
    User_name = Uname.span.text
    User_id = Uname.attrs['data-memberid']
    U_name.append(User_name)
    U_id.append(User_id)
    # Get view count of complaint
    Vcount = html_soup.main.find('span', {'view-count-detail'})
    View_count = Vcount.text
    C_count.append(View_count)
    # Get reply for complaint
    Rpnm = html_soup.main.find('h4', {'name'})
    Reply_name = Rpnm.next
    R_name.append(Reply_name)
    # Get reply date
    Rpdt = html_soup.main.find('span', {'date-tips'})
    Reply_date = Rpdt.attrs['title']
    R_date.append(Reply_date)
    # Get reply text
    Rptx = html_soup.main.find('p', {'comment-content-msg company-comment-msg'})
    Reply_text = Rptx.text
    R_text.append(Reply_text)


link_list = getLinks('https://www.sikayetvar.com/arcelik')

for i in link_list:
    z = GetData(i)
    print(z)

PS：我的下一步是将所有信息放入一个数据框中

理查德·英格利斯

您的GetData()方法调用自身，没有基本情况：这会导致无限递归：

def GetData(data):
    for i in GetData(data):

你也在打电话，response = get(i)但随后忽略了结果......也许你的意思是说

def GetData(link):
    i = get(link)
    ...

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-07-20

我来说两句

0条评论

登录后参与评论

来自分类Dev

BeautifulSoup获取文本链接？

来自分类Dev

使用BeautifulSoup从网页获取链接并滚动以获取更多信息

来自分类Dev

BeautifulSoup HTML获取src链接

来自分类Dev

如何从页面获取链接-BeautifulSoup

来自分类Dev

BeautifulSoup获取链接的内容/文本

来自分类Dev

Python Beautifulsoup 获取超链接

来自分类Dev

如何使用beautifulsoup从链接获取文本和URL

来自分类Dev

使用BeautifulSoup搜寻网页以获取链接标题和URL

来自分类Dev

使用BeautifulSoup收集信息

来自分类Dev

如何获取<li>标签信息（BeautifulSoup Webscraping）？

来自分类Dev

beautifulsoup不打印链接

来自分类Dev

Beautifulsoup返回双链接

来自分类Dev

使用beautifulsoup从HTML获取链接文本

来自分类Dev

使用BeautifulSoup在html中获取链接

来自分类Dev

如何在BeautifulSoup中从onclickvalue获取链接？

来自分类Dev

如何使用BeautifulSoup从网站获取href链接

来自分类Dev

如何从网页获取链接-BeautifulSoup / Python

来自分类Dev

Beautifulsoup：如何从列表中获取某些链接？

来自分类Dev

Beautifulsoup获取跨度内容

来自分类Dev

获取属性值BeautifulSoup

来自分类Dev

BeautifulSoup：获取课程文本

来自分类Dev

使用BeautifulSoup获取日期

来自分类Dev

获取元素BeautifulSoup 4

来自分类Dev

BeautifulSoup：获取空变量

来自分类Dev

Beautifulsoup获取跨度内容

来自分类Dev

无法从BeautifulSoup获取表

来自分类Dev

从 BeautifulSoup 对象获取 URL

来自分类Dev

BeautifulSoup：无法获取文本

来自分类Dev

Python beautifulsoup 和 .append

Related 相关文章

文章