HOLIDAY SALE! Save 50% on Membership with code HOLIDAY50. Save 15% on Mentorship with code HOLIDAY15.

4) Web Scraping Lesson

Working with JSON

8 min to complete · By Martin Breuss

Most modern APIs will provide their data as a JSON response by default or at least give you the option to request the data in JSON. JSON stands for JavaScript Object Notation, which was initially derived from JavaScript but is a language-independent format of data exchange that has established itself as the de-facto standard for data interchange on the Internet, alongside XML.

You might have already processed an API response with JSON in the previous module of this course. If you haven't yet, this lesson will catch you up on how to work with JSON in Python. You'll work with the unofficial Studio Ghibli API, which

catalogs the people, places, and things found in the worlds of Ghibli [...]

Studio Ghibli is a Japanese animation film studio that has created some of the most famous animated feature films.

Receive a JSON API Response

To get the JSON response from an API, you can use the requests package and make a call to the relevant API endpoint. For example, you can request a JSON response of all the films made by Studio Ghibli:

import requests

response = requests.get("https://ghibliapi-iansedano.vercel.app/api/films")
print(response.json())

This request will give you the same data that you can also see when accessing the API endpoint in your browser:

JSON API response of all Studio Ghibli films

Advantages of JSON

The advantage of using the .json() method on your requests object is that the data is already converted into familiar Python data types that you can work with:

print(type(response.json()))  # OUTPUT: <class 'list'>

This means that you can start slicing and accessing information from the returned data right away.

Parsing JSON in Python

Once you have access to your data in a familiar format, you can start to do some analysis on it or parse it for display.

Illustration of a lighthouse

Tasks

  • Print out all movie titles and their directors.
  • How long is the longest film? When was it released, and what's its original title? Note the data types of your values when checking for max() of a value.

You should always try to minimize the number of API calls that you make to an API. Providing a response for you costs computing power, which you'll either have to pay for yourself or the maintainer of the API will.

Therefore, you can save the JSON response to a file once and then read it from there to perform your analysis.

Use Python JSON Loads and Python JSON Dump

To save any Python object that can be serialized to JSON as a .json file, you can use the Python JSON dump function.

import json
import requests

response = requests.get("https://ghibliapi-iansedano.vercel.app/api/films")
data = response.json()

with open("films.json", "w") as fout:
    json.dump(data, fout)

This code snippet will make a single call to the Studio Ghibli API's /films endpoint, convert the response from JSON to a list of dictionaries and save that list under the variable name data.

Then you create a new file called films.json, and you serialize the Python list object back into valid JSON data to save it in that file with the help of the json module.

To access the data in your JSON file and work with it in Python, you can open and de-serialize it again using the json Python module's json loads:

import json

with open("films.json", "r") as fin:
    data = json.load(fin)

print(len(data))  # OUTPUT: 21
print(type(data))  # OUTPUT: <class 'list'>

If you write this JSON example code snippet in a new Python file, you won't need to make another API call and have all the freedom to analyze and format the data to work with the information that you're interested in.

Illustration of a lighthouse

Tasks

  • Separate the logic of calling the API into a separate Python file and save the response to a new JSON file.
  • Refactor the code you wrote earlier to analyze the Studio Ghibli films and put it into a separate file. Pull the data from your saved JSON file.
Illustration of a lighthouse

Note: When working with APIs, it's generally good practice to limit the amount of calls you need to make to the API to as few as possible.

In the next lesson, you'll step away from such nicely structured data that APIs provide for you, and instead, you'll dive into the process of web scraping, which is more complex but also gives you access to data from the Internet that you otherwise wouldn't have access to.

Colorful illustration of a light bulb

Additional Resources

Summary: What is Python JSON

  • JSON stands for JavaScript Object Notation
  • JSON is a language-independent format of data exchange that has established itself as the de-facto standard for data interchange on the Internet
  • Most modern APIs use JSON
  • JSON can be easily converted into Python data types
  • Python comes with a native json package