census_api
Demo Notebook¶This notebook is intended to demonstrate the features of the census_api
package.
import pandas as pd
from datascience import *
Before you begin working with the US Census API, you need to obtain an API key. This can be done here.
Once you have your API key, paste it in the cell below (where it says YOUR_API_KEY_GOES_HERE
) and then run the cell. This creates a config.py
file which Git will ignore (since this file pattern is listed in the .gitignore
file) to store your API key safely.
my_api_key = "YOUR_API_KEY_GOES_HERE"
with open("config.py", "w+") as f:
f.write("""api_key = \"{}\"""".format(my_api_key))
The next code cell will import this file so that your API key can be used in this notebook.
This package utilizes the CensusQuery
class to run queries through. To instantiate the class, you need your API key and the dataset you want to query. Currently, this package only supports querying the ACS1, ACS5, and SF1 datasets. When instantiating the class, you can also optionally provide a year to query data for and an output type.
import config
import census_api
# create the class instance
c = census_api.CensusQuery(config.api_key, "acs5")
To query the API, use the CensusQuery.query
method. The parameters are listed below.
Parameter | Type | Description |
---|---|---|
variables |
list |
List of variables to extract from the API. For variable identifiers, find the dataset you're querying on this page and click on variables in its row. |
state |
str |
The 2-letter abbreviation of the state you want data for |
county |
str |
Optional. The name of the county you want data for. Defaults to all. |
tract |
str |
Optional. The FIPS code of the tract you want data for. Defaults to all. |
year |
int |
Optional. The year you want data for. If provided, the year provided to the instance of CensusQuery is ignored. more info below |
An example of a query is given below.
output_2014 = c.query(["NAME", "B00001_001E"], "CA", county="Alameda", year=2014)
output_2014.head()
There are two ways to define the year that you want to query: in the class instance, or in the CensusQuery.query
call. If you define it in the class instances, e.g. with
c_2015 = census_api.CensusQuery(config.api_key, "acs5", year=2015)
then you don't need to provide it when you call CensusQuery.query
. However, if you do provide it to CensusQuery.query
, the year for the class instance will be ignored. So, if I were to call
c_2015.query(["NAME"], "CA", year=2014)
I would get 2014 data, not 2015 data.
If you don't define it in the class instance, you must define it in the CensusQuery.query
call, or else your output will be empty.
The CensusQuery
class can output your data in one of two ways: as a pandas
DataFrame or as a datascience
Table. The class defaults to pandas
, but setting the out
argument when instantiating the class can change this setting. The two possible values of out
are "pd"
and "ds"
, defaulting to "pd"
.
ds_output = census_api.CensusQuery(config.api_key, "acs5", out="ds")
Now, if I were to make the same query as above, the output would be of class datascience.tables.Table
.
# original instance
print(type(c.query(["NAME", "B00001_001E"], "CA", county="Alameda", year=2014)))
# datascience instance
print(type(ds_output.query(["NAME", "B00001_001E"], "CA", county="Alameda", year=2014)))
For more information about the Census API, visit https://www.census.gov/developers/. If you have any issues with census_api
, please open an issue on our Github repo.