census_api Demo Notebook

This notebook is intended to demonstrate the features of the census_api package.

In [ ]:
import pandas as pd
from datascience import *

Before you Start

Before you begin working with the US Census API, you need to obtain an API key. This can be done here.

Once you have your API key, paste it in the cell below (where it says YOUR_API_KEY_GOES_HERE) and then run the cell. This creates a config.py file which Git will ignore (since this file pattern is listed in the .gitignore file) to store your API key safely.

In [ ]:
my_api_key = "YOUR_API_KEY_GOES_HERE"

with open("config.py", "w+") as f:
    f.write("""api_key = \"{}\"""".format(my_api_key))

The next code cell will import this file so that your API key can be used in this notebook.

Initializing the Query Class

This package utilizes the CensusQuery class to run queries through. To instantiate the class, you need your API key and the dataset you want to query. Currently, this package only supports querying the ACS1, ACS5, and SF1 datasets. When instantiating the class, you can also optionally provide a year to query data for and an output type.

In [ ]:
import config
import census_api

# create the class instance
c = census_api.CensusQuery(config.api_key, "acs5")

Making Queries

To query the API, use the CensusQuery.query method. The parameters are listed below.

Parameter Type Description
variables list List of variables to extract from the API. For variable identifiers, find the dataset you're querying on this page and click on variables in its row.
state str The 2-letter abbreviation of the state you want data for
county str Optional. The name of the county you want data for. Defaults to all.
tract str Optional. The FIPS code of the tract you want data for. Defaults to all.
year int Optional. The year you want data for. If provided, the year provided to the instance of CensusQuery is ignored. more info below

An example of a query is given below.

In [ ]:
output_2014 = c.query(["NAME", "B00001_001E"], "CA", county="Alameda", year=2014)
output_2014.head()

Years

There are two ways to define the year that you want to query: in the class instance, or in the CensusQuery.query call. If you define it in the class instances, e.g. with

c_2015 = census_api.CensusQuery(config.api_key, "acs5", year=2015)

then you don't need to provide it when you call CensusQuery.query. However, if you do provide it to CensusQuery.query, the year for the class instance will be ignored. So, if I were to call

c_2015.query(["NAME"], "CA", year=2014)

I would get 2014 data, not 2015 data.

If you don't define it in the class instance, you must define it in the CensusQuery.query call, or else your output will be empty.

Output Types

The CensusQuery class can output your data in one of two ways: as a pandas DataFrame or as a datascience Table. The class defaults to pandas, but setting the out argument when instantiating the class can change this setting. The two possible values of out are "pd" and "ds", defaulting to "pd".

In [ ]:
ds_output = census_api.CensusQuery(config.api_key, "acs5", out="ds")

Now, if I were to make the same query as above, the output would be of class datascience.tables.Table.

In [ ]:
# original instance
print(type(c.query(["NAME", "B00001_001E"], "CA", county="Alameda", year=2014)))

# datascience instance
print(type(ds_output.query(["NAME", "B00001_001E"], "CA", county="Alameda", year=2014)))

More Information

For more information about the Census API, visit https://www.census.gov/developers/. If you have any issues with census_api, please open an issue on our Github repo.