Python BeautifulSoup Spider가 작동하지 않습니다.

debugcn 에 게시 Dev

Hiroyuki Nuri

안녕하세요 저는 파이썬으로 요소를 스크랩하는 방법을 배우려고하는데 웹 페이지 (local.ch)의 제목을 얻으려고했지만 코드가 작동하지 않고 이유를 모르겠습니다.

여기에 파이썬 코드 :

import requests
from bs4 import BeautifulSoup

def spider(max_pages):
    page = 2
    while page < max_pages:
        url = 'http://yellow.local.ch/fr/q/Morges/Bar.html?page=' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('a', {'class':'details-entry-title-link'}):
            title = link.string
            print(title)
        page += 1

spider(3)

코드가 정확하다고 확신합니다. pycharm에 오류가 없습니다. 왜 작동하지 않습니까?

Renae Lider

코드에 주요 버그가 있습니다.

page = 1
while page < max_pages
....
spider(1)

조건이 충족되지 않고 나머지 코드가 실행되지 않습니다! 다른 버그는 인코딩 오류 및 지정되지 않은 파서 경고입니다.

import requests
from bs4 import BeautifulSoup

def spider(max_pages):
    page = 1
    while page <= max_pages:
        url = 'http://yellow.local.ch/fr/q/Morges/Bar.html?page=' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text.encode("utf-8")
        soup = BeautifulSoup(plain_text, 'html.parser')
        for link in soup.findAll('a', {'class':'details-entry-title-link'}):
            title = link.string
            print(title.encode("utf-8"))
        page += 1

spider(1)

인코딩 "utf-8"부분에 유의하십시오. 이 인코딩은 b접두사 에서 볼 수 있듯이 이진 출력이됩니다 . 이 단계가 없으면 print()함수에서 오류가 발생합니다. plain_textplain_text = source_code.text.encode("utf-8")온라인에서도 동일한 변경이 이루어집니다 .

또 다른 버그는 잘못된 page += 1줄 들여 쓰기입니다 . while 루프 안에 있어야합니다.

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-06-4

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사

Python BeautifulSoup Spider가 작동하지 않습니다.

Python BeautifulSoup Spider가 작동하지 않습니다.

Python BeautifulSoup find ()가 제대로 작동하지 않습니다.

Beautifulsoup Python Youtube Scrape가 작동하지 않습니다.

python beautifulsoup if-in-statement가 제대로 작동하지 않습니다.

Python findAll이 beautifulsoup 3에서 작동하지 않습니다.

Python BeautifulSoup이 print ()하지 않습니다.

Python BeautifulSoup find_all ()이 for 루프에서 작동하지 않습니다.

내 웹 크롤러가 BeautifulSoup에서 작동하지 않습니다.

Python Selenium 다중 스레드가 작동하지 않습니다.

Python 추가가 예상대로 작동하지 않습니다.

Python BeautifulSoup unwrap ()이 예외적으로 작동하지 않습니다. 태그 내용을 추출하고 싶다

python selenium send_key ()가 작동하지 않습니다.

python manage.py runserver가 작동하지 않습니다.

내 Python TeleBot 계산기가 작동하지 않습니다.

pip 설치가 python <3.6에서 작동하지 않습니다.

Python Tkinter가 .py 파일에서 작동하지 않습니다.

python isinstance가 예상대로 작동하지 않습니다.

Python groupby가 예상대로 작동하지 않습니다.

Python struct.calcsize가 QWord에서 작동하지 않습니다.

Python Appegine sys.path.append ()가 작동하지 않습니다.

.close () python CSV가 작동하지 않는 것 같습니다.

.close () python CSV가 작동하지 않는 것 같습니다.

Python PyQt Pyside-QFileDialog의 setNameFilters가 작동하지 않습니다.

Python의 Foreach가 예상대로 작동하지 않습니다.

Python에서 'Communicate'가 작동하지 않습니다.

python remove () 함수가 작동하지 않습니다.

Python 3.4 pickle.load ()가 작동하지 않습니다.

기본 Python 스크립트가 작동하지 않습니다.

Codeskulptor 코드가 Python 2.7에서 작동하지 않습니다.

Python SQLite3가 작동하지 않습니다.