JSON¶

As mentioned in the slides, JSON is very simliar to python's dictionaries. To demonstrate that, we'll go over a couple of examples.

In [ ]:

import json

json_obj = {'Name': 'Interstellar', 'Genres': ['Science Fiction', 'Drama']}

# What the raw python looks like
print(json_obj)
print(type(json_obj))

JSON String¶

In [ ]:

str_obj = json.dumps(json_obj)
print(str_obj)
print(type(str_obj))

The only real change you may notice is that the single qoutes (') were replaced with double qoutes ". This is because all string objects must be double qouted in proper JSON.

Now that it is a string, we can't index into it:

In [ ]:

json_obj['Name']

In [ ]:

str_obj['Name']

Loading back in with json.loads(string) allows us to resume interacting with the python object:

In [ ]:

new_obj = json.loads(str_obj)
new_obj['Name']

Writing Out JSON¶

Rather than dumping to a string, we can also dump to a file using json.dump(file_pointer).

In [ ]:

with open('test_json.json', 'w') as json_file:
    json.dump(json_obj, json_file)

Likewise we can read that information back in using json.load(file_pointer):

In [ ]:

with open('test_json.json', 'r') as json_file:
    json_data = json.load(json_file)
    
print(json_data)
print()
print(json_data['Name'])

Incompatiable Data Types¶

Sometimes when working with json we will need to reinterpret certain python types to ensure that they work with JSON's limitations.

For example datetime objects don't play nicely with JSON.

In [ ]:

import json
from datetime import datetime

json_obj = {'Name': 'Interstellar', 'Genres': ['Science Fiction', 'Drama'], 'Release Date': datetime.now()}

In [ ]:

print(json_obj)

In [ ]:

print(json.dumps(json_obj))

Serialization¶

In this instance we need to write some code to handle these conversions or serialize the data.

Serialization: The process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later.

With python's JSON module we can leverage serializers in the json.dumps() function to serialize our data into a string format.

json.dumps(python_obj, default=json_serilaizer)

In [ ]:

def json_serializer(obj):
    if isinstance(obj, (datetime)):
        return obj.strftime("%Y-%m-%d %H:%M:%S")
    
    
print(json.dumps(json_obj, default=json_serializer, indent=4))

Other Serialization¶

Pickle and Dill are two python serialization packages we commonly use to save out python objects for later use. This might be a dataframe, machine learning model, or some other object.

For deep learning we typically save our models out in HDF5 (Hierarchial Data Format).

Dill/Pickle¶

Similar to the json library, dill/pickle support the .dump() and .load() methods

In [ ]:

import dill as pkl

with open('test_file.pkl', 'wb') as pkl_file:
    pkl.dump(json_obj, pkl_file)

We can show this worked by reading back in the serialized file:

In [ ]:

with open('test_file.pkl', 'rb') as pkl_file:
    data_obj = pkl.load(pkl_file)
    print(data_obj)