Web scraping - challenges articulating hierarchy in my code

William

Objective:
I am trying to scrape a 100s of web pages, specifically the ingredients for the recipe on each. If we take an example - which contains the recipe for an Egg Sandwich (url) for which I'm using many Python dependencies including BeautifulSoup, splinter.Browser, ChromeDrivermanager.

Expected output:
Once I have scraped the ingredients, I would like to save them in a dictionary. Example below -

recipes = {"quick_and_easy_egg_salad_sandwich_recipe":
['1-2 tablespoons mayonnaise (to taste)',
 '2 tablespoons chopped celery',
 '2 slices white, wheat, multigrain, or rye bread, toasted or plain']

What I've achieved:
1. I have been able to determine 'roughly' (through Web Inspector) what I need to focus on - enter image description here
It looks like each ingredient has it's own <li class='ingredient'> however it looks like I have either misinterpreted the hierarchy or my code is incorrect.

2.My code is as follows -

executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path)

webpage_url = 'https://www.simplyrecipes.com/recipes/egg_salad_sandwich/'
browser.visit(webpage_url)
time.sleep(1)
website_html = browser.html
website_soup = BeautifulSoup(website_html, 'html.parser')
ingredients = website_soup.find('h3', class_="Ingredients")
ingredientsList = ingredients.find('li', class_ = "ingredient")
print({ingredients})

When I attempt to print {ingredients} I get a AttributeError: 'NoneType' object has no attribute 'find'

I know my code is flawed, however I just don't know how to approach this and wondered if anyone has any suggestions?

sushanth

try this,

import requests
from bs4 import BeautifulSoup

resp = requests.get("https://www.simplyrecipes.com/recipes/egg_salad_sandwich/")

soup = BeautifulSoup(resp.text, "html.parser")
div_ = soup.find("div", attrs={"class": "recipe-callout"})

recipes = {"_".join(div_.find("h2").text.split()):
               [x.text for x in div_.findAll("li", attrs={"class": "ingredient"})]}

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

How to fix web scraping Python code "IndexError: list index out of range" when the code hits missing values

分類Dev

VBA web Scraping problems

分類Dev

Web Scraping stocks

分類Dev

Web scraping with BeautifulSoup on Wikipedia

分類Dev

Web Scraping in Python with BeautifulSoup

分類Dev

Web scraping using BeautifulSoup

分類Dev

Improve Code - Web Scraping Job Offers - Title, Employer, Salary, Link required

分類Dev

Why is my code not rendering a list of table on my web page?

分類Dev

Web scraping multiple pages with BeautifulSoup

分類Dev

Authenticate using cookies for web scraping?

分類Dev

Web Scraping using Requests - Python

分類Dev

Python - Web scraping using Scrapy

分類Dev

Web scraping company description from StackOverflow companies

分類Dev

Web scraping relevant information from soup file

分類Dev

web scraping product and store information from Target

分類Dev

Web Scraping Iteratively from a WebPage in R

分類Dev

Trouble returning web scraping output as dictionary

分類Dev

Web Scraping WSJ Archive with BS4

分類Dev

Scraping data- attributes from web page

分類Dev

Web scraping Tennis24 in play stats

分類Dev

Python html web scraping on header and title

分類Dev

Web scraping: Combining tables in for-loop in R

分類Dev

Defensive web scraping techniques for scrapy spider

分類Dev

使用Web Scraping导航到表主体

分類Dev

In need of an explanation of Web scraping with Nokogiri in Rails

分類Dev

some issues with web scraping imd website

分類Dev

Scraping code not working in php to form controls

分類Dev

Lifecycling in SwifUI: Running code when leaving a child view of a NavigationView hierarchy

分類Dev

Selenium Webdriver / Beautifulsoup + Web Scraping +エラー416

Related 関連記事

  1. 1

    How to fix web scraping Python code "IndexError: list index out of range" when the code hits missing values

  2. 2

    VBA web Scraping problems

  3. 3

    Web Scraping stocks

  4. 4

    Web scraping with BeautifulSoup on Wikipedia

  5. 5

    Web Scraping in Python with BeautifulSoup

  6. 6

    Web scraping using BeautifulSoup

  7. 7

    Improve Code - Web Scraping Job Offers - Title, Employer, Salary, Link required

  8. 8

    Why is my code not rendering a list of table on my web page?

  9. 9

    Web scraping multiple pages with BeautifulSoup

  10. 10

    Authenticate using cookies for web scraping?

  11. 11

    Web Scraping using Requests - Python

  12. 12

    Python - Web scraping using Scrapy

  13. 13

    Web scraping company description from StackOverflow companies

  14. 14

    Web scraping relevant information from soup file

  15. 15

    web scraping product and store information from Target

  16. 16

    Web Scraping Iteratively from a WebPage in R

  17. 17

    Trouble returning web scraping output as dictionary

  18. 18

    Web Scraping WSJ Archive with BS4

  19. 19

    Scraping data- attributes from web page

  20. 20

    Web scraping Tennis24 in play stats

  21. 21

    Python html web scraping on header and title

  22. 22

    Web scraping: Combining tables in for-loop in R

  23. 23

    Defensive web scraping techniques for scrapy spider

  24. 24

    使用Web Scraping导航到表主体

  25. 25

    In need of an explanation of Web scraping with Nokogiri in Rails

  26. 26

    some issues with web scraping imd website

  27. 27

    Scraping code not working in php to form controls

  28. 28

    Lifecycling in SwifUI: Running code when leaving a child view of a NavigationView hierarchy

  29. 29

    Selenium Webdriver / Beautifulsoup + Web Scraping +エラー416

ホットタグ

アーカイブ