Handling JSON Files

This notebook showcases methods to read JSON type data:

  • using python's inbuilt utilities
  • using pandas
In [1]:
# import required libraries
import json
import pandas as pd

Utilities

In [2]:
def print_nested_dicts(nested_dict,indent_level=0):
    """This function prints a nested dict object
    Args:
        nested_dict (dict): the dictionary to be printed
        indent_level (int): the indentation level for nesting
    Returns:
        None

    """
    
    for key, val in nested_dict.items():
        if isinstance(val, dict):
          print("{0} : ".format(key))
          print_nested_dicts(val,indent_level=indent_level+1)
        elif isinstance(val,list):
            print("{0} : ".format(key))
            for rec in val:
                print_nested_dicts(rec,indent_level=indent_level+1)
        else:
          print("{0}{1} : {2}".format("\t"*indent_level,key, val))

def extract_json(file_name,do_print=True):
    """This function extracts and prints json content from a given file
    Args:
        file_name (str): file path to be read
        do_print (bool): boolean flag to print file contents or not

    Returns:
        None

    """
    try:
        json_filedata = open(file_name).read() 
        json_data = json.loads(json_filedata)
        
        if do_print:
            print_nested_dicts(json_data)
    except IOError:
        raise IOError("File path incorrect/ File not found")
    except ValueError:
        ValueError("JSON file has errors")
    except Exception:
        raise

def extract_pandas_json(file_name,orientation="records",do_print=True):
    """This function extracts and prints json content from a file using pandas
       This is useful when json data represents tabular, series information
    Args:
        file_name (str): file path to be read
        orientation (str): orientation of json file. Defaults to records
        do_print (bool): boolean flag to print file contents or not

    Returns:
        None

    """
    try:
        df = pd.read_json(file_name,orient=orientation)
        
        if do_print:
            print(df)
    except IOError:
        raise IOError("File path incorrect/ File not found")
    except ValueError:
        ValueError("JSON file has errors")
    except Exception:
        raise

Parse using json module

The extract_json() function takes the input file name as input parameter.

In [3]:
extract_json(r'sample_json.json')
outer_col_2 : 
	inner_col_1 : 3
outer_col_1 : 
	nested_inner_col_1 : val_1
	nested_inner_col_2 : 2
	nested_inner_col_1 : val_2
	nested_inner_col_2 : 2
outer_col_3 : 4

The function generates a nested output resembling the structure of the JSON itself where outer_col_1's value is a nested object in itself


Parse using pandas

The extract_pandas_json() function takes the input file name as input parameter. It uses pandas to do the heavy lifting

In [4]:
extract_pandas_json(r'pandas_json.json')
  col_1 col_2
0     a     b
1     c     d
2     e     f
3     g     h
4     i     j
5     k     l

The output in the above cell shows how pandas reads a JSON and prepares a tabular dataframe