As mentioned in the slides, JSON is very simliar to python's dictionaries. To demonstrate that, we'll go over a couple of examples.
import json
json_obj = {'Name': 'Interstellar', 'Genres': ['Science Fiction', 'Drama']}
# What the raw python looks like
print(json_obj)
print(type(json_obj))
str_obj = json.dumps(json_obj)
print(str_obj)
print(type(str_obj))
The only real change you may notice is that the single qoutes (') were replaced with double qoutes ". This is because all string objects must be double qouted in proper JSON.
Now that it is a string, we can't index into it:
json_obj['Name']
str_obj['Name']
Loading back in with json.loads(
string
)
allows us to resume interacting with the python object:
new_obj = json.loads(str_obj)
new_obj['Name']
Rather than dumping to a string, we can also dump to a file using json.dump(
file_pointer
)
.
with open('test_json.json', 'w') as json_file:
json.dump(json_obj, json_file)
Likewise we can read that information back in using json.load(
file_pointer
)
:
with open('test_json.json', 'r') as json_file:
json_data = json.load(json_file)
print(json_data)
print()
print(json_data['Name'])
Sometimes when working with json we will need to reinterpret certain python types to ensure that they work with JSON's limitations.
For example datetime objects don't play nicely with JSON.
import json
from datetime import datetime
json_obj = {'Name': 'Interstellar', 'Genres': ['Science Fiction', 'Drama'], 'Release Date': datetime.now()}
print(json_obj)
print(json.dumps(json_obj))
In this instance we need to write some code to handle these conversions or serialize the data.
With python's JSON module we can leverage serializers in the json.dumps()
function to serialize our data into a string format.
json.dumps(python_obj, default=json_serilaizer)
def json_serializer(obj):
if isinstance(obj, (datetime)):
return obj.strftime("%Y-%m-%d %H:%M:%S")
print(json.dumps(json_obj, default=json_serializer, indent=4))
Pickle and Dill are two python serialization packages we commonly use to save out python objects for later use. This might be a dataframe, machine learning model, or some other object.
For deep learning we typically save our models out in HDF5 (Hierarchial Data Format).
Similar to the json library, dill/pickle support the .dump()
and .load()
methods
import dill as pkl
with open('test_file.pkl', 'wb') as pkl_file:
pkl.dump(json_obj, pkl_file)
We can show this worked by reading back in the serialized file:
with open('test_file.pkl', 'rb') as pkl_file:
data_obj = pkl.load(pkl_file)
print(data_obj)