🌑

Micah D. Cochran, MSCS

scrape-schema-recipe Library

scrape-schema-recipe is a Python library that scrapes recipes from websites and HTML files. Those recipes are structured microdata using the schema.org/Recipe format. It returns that representation as a list of dictionaries (since there could be multiple recipes on a webpage such as a cookbook).

Screenshot of a Chicken and Black Bean Salsa Burritos recipe from the MedlinePlus website.

Below is the Python dictionary of ‘Chicken and Black Bean Salsa Burritos’:


{'@context': 'https://schema.org/',
 '@type': 'Recipe',
 'author': {'@type': 'Organization', 'name': 'Food Hero'},
 'cookTime': 'PT30M',
 'datePublished': '2021-04-01',
 'description': 'Fresh lemon juice and green onions bring out a zesty flavor '
                'in these oven-baked chicken and black bean salsa burritos. '
                'Pepper Jack cheese adds a nice kick!',
 'image': ['https://medlineplus.gov/images/recipe_chickenandblackbeansalsaburritos.jpg',
           'https://medlineplus.gov/images/recipe_chickenandblackbeansalsaburritos_fb.jpg'],
 'keywords': 'lemon juice,green onions,zesty,oven,chicken,black '
             'bean,salsa,pepper jack cheese,lunch,dinner',
 'name': 'Chicken and Black Bean Salsa Burritos',
 'nutrition': {'@type': 'NutritionInformation',
               'calories': '260 calories',
               'carbohydrateContent': '24 grams',
               'cholesterolContent': '55 milligrams',
               'fatContent': '8 grams',
               'fiberContent': '4 grams',
               'proteinContent': '22 grams',
               'saturatedFatContent': '3.5 grams',
               'servingSize': '1/2 burrito (161 grams)',
               'sodiumContent': '410 milligrams'},
 'prepTime': 'PT20M',
 'recipeCategory': 'lunch,dinner',
 'recipeCuisine': 'American',
 'recipeIngredient': ['1 can (15 ounces) black beans, drained and rinsed',
                      '2 green onions, chopped',
                      '1 Tablespoon lemon juice',
                      '1/4 teaspoon ground cumin',
                      '1/2 teaspoon salt, divided',
                      '4 boneless, skinless chicken breasts',
                      '1/4 teaspoon chili powder',
                      '1/4 teaspoon pepper',
                      '1 cup shredded cheese (try cheddar, pepper jack, or '
                      'Mexican blend)',
                      '4 (9-inch) flour tortillas'],
 'recipeInstructions': [{'@type': 'HowToStep',
                         'text': 'Preheat oven to 350 °F.'},
                        {'@type': 'HowToStep',
                         'text': 'In a small bowl, combine the beans, green '
                                 'onions, lemon juice, cumin, and 1/4 teaspoon '
                                 'salt.'},
                        {'@type': 'HowToStep',
                         'text': 'Rub the chicken breasts with the chili '
                                 'powder, pepper and the remaining 1/4 '
                                 'teaspoon salt.'},
                        {'@type': 'HowToStep',
                         'text': 'Cook the chicken in a skillet over '
                                 'medium-high heat (350 °F in an electric '
                                 'skillet) for 5 to 7 minutes. Turn it over '
                                 'and cook until the internal temperature of '
                                 'the thickest part reaches 165 °F using a '
                                 'food thermometer, about 5 to 7 minutes '
                                 'longer.'},
                        {'@type': 'HowToStep',
                         'text': 'Let chicken cool; slice into strips or '
                                 'chunks.'},
                        {'@type': 'HowToStep',
                         'text': 'Divide cheese evenly between tortillas. Top '
                                 'the cheese with equal amounts of chicken and '
                                 'black bean salsa mixture.'},
                        {'@type': 'HowToStep',
                         'text': 'Roll up the burritos and wrap each one in '
                                 'foil.'},
                        {'@type': 'HowToStep',
                         'text': 'Bake burritos until the cheese melts, about '
                                 '15 minutes.'},
                        {'@type': 'HowToStep',
                         'text': 'Refrigerate leftovers within 2 hours.'}],
 'recipeYield': '8 servings',
 'totalTime': 'PT50M',
 'url': 'https://medlineplus.gov/recipes/chicken-and-black-bean-salsa-burritos/'}

Here’s the Python code used to print the above:

from scrape_schema_recipe import scrape_url
import pprint

url = "https://medlineplus.gov/recipes/chicken-and-black-bean-salsa-burritos/"
results = scrape_url(url)
recipe = results[0]
pprint.pprint(recipe)

This library is hosted on pypi. In order to run the example, the library will need to be installed:


$ pip3 install scrape-schema-recipe

This library scrapes the source and gives the results at a list of dictionaries. The code can optionally be convert the some data into native Python objects,

  • so dates turn into datetime.date objects,
  • and durations (like “Cook Time: 30 minutes”) into datetime.timedelta objects (datetime.timedelta(seconds=1800) — minutes are not a timedelta unit).

Since schema.org/Recipe data is structured Microdata, this is a pattern that web designers and frontend web developers can have a standard way to structure recipe data. Others software can copying that structured recipe or doing something else with it that it (software doing nutrition calculations, checking if it meets a specific diet, and so on).

Unit testing is used to ensure that future changes do not break existing functionality.

, , , , — Jul 30, 2021

Search

    Made with Hexo Hexo.js . Website's repo.