cenpy
to analyze Segregation in US Cities¶Levi John Wolf
University of Bristol
levi.john.wolf@bristol.ac.uk
cenpy
makes it dead simple to fetch demographic data. It works by automatically discovering different data products made available by the US Census Bureau, exposing these data products in a consistent pythonic fashion, and then wrangling the data into a clean geopandas
dataframe. This is useful in the analysis of segregation in American cities. It's often difficult to do large-scale demographic analysis because demographic data at sufficiently fine-grained spatial resolution is hard to get and process over a large geographic extent. A few cities in a few states rapidly becomes a difficult analytical task. With cenpy
, though, this becomes easy.
Further, the new segregation
package allows for the computation and comparison of segregation measures in different urban systems. While it's often easy to compute a single measure of segregation, it's difficult to conduct inference on that measure. Thus, we can easily figure out how segregated a place is, but not figure out the intrinsic uncertainty in that estimated segregation measure. Further, comparing segregation indices between places or across time should consider this uncertainty.
Fortunately, using cenpy
and segregation
packages in Python, we can conduct fast analyses to examine how segregation changes over time or between cities. Below, I'll walk through an example of how you can examine changes in the segregation of Hispanic populations over time in Phoenix and comparison of segregation between Phoenix and Austin.
First, the packages we need are cenpy
and segregation
. But, to help get a sense of what the areas look like, I use contextily
, a simple package to request basemap tiles to use in matplotlib plots.
import cenpy
import segregation
import contextily
%matplotlib inline
Cenpy has two different ways it can be used. The new product
API focus on using geographical names to make querying as simple as possible. But, because it requires a little more prior knowledge about how queries should be formed, there are a limited number of data products that are supported. The 5-year ACS and 2010 Decennial census are supported. By default, the most recent 5-year ACS is fetched.
acs = cenpy.products.ACS()
Once the product is built, it has a few useful attributes. All of the variables, or columns in the Census's database, are contained within the dataframe variables
:
acs.variables
attributes | concept | group | label | limit | predicateOnly | predicateType | required | values | |
---|---|---|---|---|---|---|---|---|---|
AIANHH | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
AIHHTL | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
AIRES | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
ANRC | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
B00001_001E | B00001_001EA | UNWEIGHTED SAMPLE COUNT OF THE POPULATION | B00001 | Estimate!!Total | 0 | NaN | int | NaN | NaN |
B00002_001E | B00002_001EA | UNWEIGHTED SAMPLE HOUSING UNITS | B00002 | Estimate!!Total | 0 | NaN | int | NaN | NaN |
B01001A_001E | B01001A_001M,B01001A_001MA,B01001A_001EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total | 0 | NaN | int | NaN | NaN |
B01001A_002E | B01001A_002M,B01001A_002MA,B01001A_002EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male | 0 | NaN | int | NaN | NaN |
B01001A_003E | B01001A_003M,B01001A_003MA,B01001A_003EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!Under 5 years | 0 | NaN | int | NaN | NaN |
B01001A_004E | B01001A_004M,B01001A_004MA,B01001A_004EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!5 to 9 years | 0 | NaN | int | NaN | NaN |
B01001A_005E | B01001A_005M,B01001A_005MA,B01001A_005EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!10 to 14 years | 0 | NaN | int | NaN | NaN |
B01001A_006E | B01001A_006M,B01001A_006MA,B01001A_006EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!15 to 17 years | 0 | NaN | int | NaN | NaN |
B01001A_007E | B01001A_007M,B01001A_007MA,B01001A_007EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!18 and 19 years | 0 | NaN | int | NaN | NaN |
B01001A_008E | B01001A_008M,B01001A_008MA,B01001A_008EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!20 to 24 years | 0 | NaN | int | NaN | NaN |
B01001A_009E | B01001A_009M,B01001A_009MA,B01001A_009EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!25 to 29 years | 0 | NaN | int | NaN | NaN |
B01001A_010E | B01001A_010M,B01001A_010MA,B01001A_010EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!30 to 34 years | 0 | NaN | int | NaN | NaN |
B01001A_011E | B01001A_011M,B01001A_011MA,B01001A_011EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!35 to 44 years | 0 | NaN | int | NaN | NaN |
B01001A_012E | B01001A_012M,B01001A_012MA,B01001A_012EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!45 to 54 years | 0 | NaN | int | NaN | NaN |
B01001A_013E | B01001A_013M,B01001A_013MA,B01001A_013EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!55 to 64 years | 0 | NaN | int | NaN | NaN |
B01001A_014E | B01001A_014M,B01001A_014MA,B01001A_014EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!65 to 74 years | 0 | NaN | int | NaN | NaN |
B01001A_015E | B01001A_015M,B01001A_015MA,B01001A_015EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!75 to 84 years | 0 | NaN | int | NaN | NaN |
B01001A_016E | B01001A_016M,B01001A_016MA,B01001A_016EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Male!!85 years and over | 0 | NaN | int | NaN | NaN |
B01001A_017E | B01001A_017M,B01001A_017MA,B01001A_017EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Female | 0 | NaN | int | NaN | NaN |
B01001A_018E | B01001A_018M,B01001A_018MA,B01001A_018EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Female!!Under 5 years | 0 | NaN | int | NaN | NaN |
B01001A_019E | B01001A_019M,B01001A_019MA,B01001A_019EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Female!!5 to 9 years | 0 | NaN | int | NaN | NaN |
B01001A_020E | B01001A_020M,B01001A_020MA,B01001A_020EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Female!!10 to 14 years | 0 | NaN | int | NaN | NaN |
B01001A_021E | B01001A_021M,B01001A_021MA,B01001A_021EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Female!!15 to 17 years | 0 | NaN | int | NaN | NaN |
B01001A_022E | B01001A_022M,B01001A_022MA,B01001A_022EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Female!!18 and 19 years | 0 | NaN | int | NaN | NaN |
B01001A_023E | B01001A_023M,B01001A_023MA,B01001A_023EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Female!!20 to 24 years | 0 | NaN | int | NaN | NaN |
B01001A_024E | B01001A_024M,B01001A_024MA,B01001A_024EA | SEX BY AGE (WHITE ALONE) | B01001A | Estimate!!Total!!Female!!25 to 29 years | 0 | NaN | int | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
COUSUB | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
CSA | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
DIVISION | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
GEOCOMP | NaN | NaN | N/A | Geographic Component code | 0 | NaN | string | default displayed | {'item': {'R1': 'Not in an offshore area', 'S0... |
GEO_ID | NAME | ALLOCATION OF EDUCATIONAL ATTAINMENT FOR THE P... | B17015,B18104,B17016,B18105,B17017,B18106,B170... | Geography | 0 | NaN | string | NaN | NaN |
METDIV | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
NATION | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
NECTA | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
NECTADIV | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
PLACE | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
PLACEREM | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
PRINCITY | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
PUMA5 | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
REGION | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
SDELM | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
SDSEC | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
SDUNI | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
SLDL | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
SLDU | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
STATE | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
SUBMCD | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
TRACT | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
TRIBALBG | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
TRIBALCT | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
TRISUBREM | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
UA | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
ZCTA | NaN | NaN | N/A | Geography | 0 | NaN | NaN | NaN | NaN |
for | NaN | Census API Geography Specification | N/A | Census API FIPS 'for' clause | 0 | True | fips-for | NaN | NaN |
in | NaN | Census API Geography Specification | N/A | Census API FIPS 'in' clause | 0 | True | fips-in | NaN | NaN |
ucgid | NaN | Census API Geography Specification | N/A | Uniform Census Geography Identifier clause | 0 | True | ucgid | NaN | NaN |
25110 rows × 9 columns
That's a lot of variables! There are so many variables it's nearly impossible to understand them all at a glance. This is because most census tables are composed of a table identifier, which tells you the general topic of the variable, and then a position in that table. Fortunately, cenpy
allows you to examine the tables
directly, which are a little easier to understand:
acs.tables
description | columns | |
---|---|---|
table_name | ||
B00001 | UNWEIGHTED SAMPLE COUNT OF THE POPULATION | [B00001_001E] |
B00002 | UNWEIGHTED SAMPLE HOUSING UNITS | [B00002_001E] |
B01001 | SEX BY AGE | [B01001_001E, B01001_002E, B01001_003E, B01001... |
B01002 | MEDIAN AGE BY SEX | [B01002_001E, B01002_002E, B01002_003E] |
B01003 | TOTAL POPULATION | [B01003_001E] |
B02001 | RACE | [B02001_001E, B02001_002E, B02001_003E, B02001... |
B02008 | WHITE ALONE OR IN COMBINATION WITH ONE OR MORE... | [B02008_001E] |
B02009 | BLACK OR AFRICAN AMERICAN ALONE OR IN COMBINAT... | [B02009_001E] |
B02010 | AMERICAN INDIAN AND ALASKA NATIVE ALONE OR IN ... | [B02010_001E] |
B02011 | ASIAN ALONE OR IN COMBINATION WITH ONE OR MORE... | [B02011_001E] |
B02012 | NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALO... | [B02012_001E] |
B02013 | SOME OTHER RACE ALONE OR IN COMBINATION WITH O... | [B02013_001E] |
B02014 | AMERICAN INDIAN AND ALASKA NATIVE ALONE FOR SE... | [B02014_001E, B02014_002E, B02014_003E, B02014... |
B02015 | ASIAN ALONE BY SELECTED GROUPS | [B02015_001E, B02015_002E, B02015_003E, B02015... |
B02016 | NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALO... | [B02016_001E, B02016_002E, B02016_003E, B02016... |
B02017 | AMERICAN INDIAN AND ALASKA NATIVE (AIAN) ALONE... | [B02017_001E, B02017_002E, B02017_003E, B02017... |
B02018 | ASIAN ALONE OR IN ANY COMBINATION BY SELECTED ... | [B02018_001E, B02018_002E, B02018_003E, B02018... |
B02019 | NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALO... | [B02019_001E, B02019_002E, B02019_003E, B02019... |
B03001 | HISPANIC OR LATINO ORIGIN BY SPECIFIC ORIGIN | [B03001_001E, B03001_002E, B03001_003E, B03001... |
B03002 | HISPANIC OR LATINO ORIGIN BY RACE | [B03002_001E, B03002_002E, B03002_003E, B03002... |
B03003 | HISPANIC OR LATINO ORIGIN | [B03003_001E, B03003_002E, B03003_003E] |
B04004 | PEOPLE REPORTING SINGLE ANCESTRY | [B04004_001E, B04004_002E, B04004_003E, B04004... |
B04005 | PEOPLE REPORTING MULTIPLE ANCESTRY | [B04005_001E, B04005_002E, B04005_003E, B04005... |
B04006 | PEOPLE REPORTING ANCESTRY | [B04006_001E, B04006_002E, B04006_003E, B04006... |
B04007 | ANCESTRY | [B04007_001E, B04007_002E, B04007_003E, B04007... |
B05001 | NATIVITY AND CITIZENSHIP STATUS IN THE UNITED ... | [B05001_001E, B05001_002E, B05001_003E, B05001... |
B05002 | PLACE OF BIRTH BY NATIVITY AND CITIZENSHIP STATUS | [B05002_001E, B05002_002E, B05002_003E, B05002... |
B05003 | SEX BY AGE BY NATIVITY AND CITIZENSHIP STATUS | [B05003_001E, B05003_002E, B05003_003E, B05003... |
B05004 | MEDIAN AGE BY NATIVITY AND CITIZENSHIP STATUS ... | [B05004_001E, B05004_002E, B05004_003E, B05004... |
B05005 | PERIOD OF ENTRY BY NATIVITY AND CITIZENSHIP ST... | [B05005_001E, B05005_002E, B05005_003E, B05005... |
... | ... | ... |
C15010 | FIELD OF BACHELOR'S DEGREE FOR FIRST MAJOR FOR... | [C15010_001E, C15010_002E, C15010_003E, C15010... |
C16001 | LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 Y... | [C16001_001E, C16001_002E, C16001_003E, C16001... |
C16002 | HOUSEHOLD LANGUAGE BY HOUSEHOLD LIMITED ENGLIS... | [C16002_001E, C16002_002E, C16002_003E, C16002... |
C17002 | RATIO OF INCOME TO POVERTY LEVEL IN THE PAST 1... | [C17002_001E, C17002_002E, C17002_003E, C17002... |
C18108 | AGE BY NUMBER OF DISABILITIES | [C18108_001E, C18108_002E, C18108_003E, C18108... |
C18120 | EMPLOYMENT STATUS BY DISABILITY STATUS | [C18120_001E, C18120_002E, C18120_003E, C18120... |
C18121 | WORK EXPERIENCE BY DISABILITY STATUS | [C18121_001E, C18121_002E, C18121_003E, C18121... |
C18130 | AGE BY DISABILITY STATUS BY POVERTY STATUS | [C18130_001E, C18130_002E, C18130_003E, C18130... |
C18131 | RATIO OF INCOME TO POVERTY LEVEL IN THE PAST 1... | [C18131_001E, C18131_002E, C18131_003E, C18131... |
C21007 | AGE BY VETERAN STATUS BY POVERTY STATUS IN THE... | [C21007_001E, C21007_002E, C21007_003E, C21007... |
C24010 | SEX BY OCCUPATION FOR THE CIVILIAN EMPLOYED PO... | [C24010_001E, C24010_002E, C24010_003E, C24010... |
C24020 | SEX BY OCCUPATION FOR THE FULL-TIME, YEAR-ROUN... | [C24020_001E, C24020_002E, C24020_003E, C24020... |
C24030 | SEX BY INDUSTRY FOR THE CIVILIAN EMPLOYED POPU... | [C24030_001E, C24030_002E, C24030_003E, C24030... |
C24040 | SEX BY INDUSTRY FOR THE FULL-TIME, YEAR-ROUND ... | [C24040_001E, C24040_002E, C24040_003E, C24040... |
C24050 | INDUSTRY BY OCCUPATION FOR THE CIVILIAN EMPLO... | [C24050_001E, C24050_002E, C24050_003E, C24050... |
C24060 | OCCUPATION BY CLASS OF WORKER FOR THE CIVILIAN... | [C24060_001E, C24060_002E, C24060_003E, C24060... |
C24070 | INDUSTRY BY CLASS OF WORKER FOR THE CIVILIAN E... | [C24070_001E, C24070_002E, C24070_003E, C24070... |
C27004 | EMPLOYER-BASED HEALTH INSURANCE BY SEX BY AGE | [C27004_001E, C27004_002E, C27004_003E, C27004... |
C27005 | DIRECT-PURCHASE HEALTH INSURANCE BY SEX BY AGE | [C27005_001E, C27005_002E, C27005_003E, C27005... |
C27006 | MEDICARE COVERAGE BY SEX BY AGE | [C27006_001E, C27006_002E, C27006_003E, C27006... |
C27007 | MEDICAID/MEANS-TESTED PUBLIC COVERAGE BY SEX B... | [C27007_001E, C27007_002E, C27007_003E, C27007... |
C27008 | TRICARE/MILITARY HEALTH COVERAGE BY SEX BY AGE | [C27008_001E, C27008_002E, C27008_003E, C27008... |
C27009 | VA HEALTH CARE BY SEX BY AGE | [C27009_001E, C27009_002E, C27009_003E, C27009... |
C27012 | HEALTH INSURANCE COVERAGE STATUS AND TYPE BY W... | [C27012_001E, C27012_002E, C27012_003E, C27012... |
C27013 | PRIVATE HEALTH INSURANCE BY WORK EXPERIENCE | [C27013_001E, C27013_002E, C27013_003E, C27013... |
C27014 | PUBLIC HEALTH INSURANCE BY WORK EXPERIENCE | [C27014_001E, C27014_002E, C27014_003E, C27014... |
C27016 | HEALTH INSURANCE COVERAGE STATUS BY RATIO OF I... | [C27016_001E, C27016_002E, C27016_003E, C27016... |
C27017 | PRIVATE HEALTH INSURANCE BY RATIO OF INCOME TO... | [C27017_001E, C27017_002E, C27017_003E, C27017... |
C27018 | PUBLIC HEALTH INSURANCE BY RATIO OF INCOME TO ... | [C27018_001E, C27018_002E, C27018_003E, C27018... |
C27021 | HEALTH INSURANCE COVERAGE STATUS BY LIVING AR... | [C27021_001E, C27021_002E, C27021_003E, C27021... |
665 rows × 2 columns
Still, there are way too many tables to inspect individually. And, tables
only provides the main tables, not the cross-tabulations by race, sex, or age which are exposed in crosstab_tables
. This problem means we need an efficient way to filter the set of tables (or variables) to focus on a specific topic. The filter_tables
and filter_variables
make this simple. There, you can filter based on table names or based on text that's within the description of the table/variable. For instance, to focus in on all tables that mention "race" in the ACS, you can use:
acs.filter_tables('RACE', by='description')
description | columns | |
---|---|---|
table_name | ||
B02001 | RACE | [B02001_001E, B02001_002E, B02001_003E, B02001... |
B02008 | WHITE ALONE OR IN COMBINATION WITH ONE OR MORE... | [B02008_001E] |
B02009 | BLACK OR AFRICAN AMERICAN ALONE OR IN COMBINAT... | [B02009_001E] |
B02010 | AMERICAN INDIAN AND ALASKA NATIVE ALONE OR IN ... | [B02010_001E] |
B02011 | ASIAN ALONE OR IN COMBINATION WITH ONE OR MORE... | [B02011_001E] |
B02012 | NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALO... | [B02012_001E] |
B02013 | SOME OTHER RACE ALONE OR IN COMBINATION WITH O... | [B02013_001E] |
B03002 | HISPANIC OR LATINO ORIGIN BY RACE | [B03002_001E, B03002_002E, B03002_003E, B03002... |
B25006 | RACE OF HOUSEHOLDER | [B25006_001E, B25006_002E, B25006_003E, B25006... |
B98013 | TOTAL POPULATION COVERAGE RATE BY WEIGHTING RA... | [B98013_001E, B98013_002E, B98013_003E, B98013... |
B99021 | ALLOCATION OF RACE | [B99021_001E, B99021_002E, B99021_003E] |
C02003 | DETAILED RACE | [C02003_001E, C02003_002E, C02003_003E, C02003... |
To focus on tables that mention hispanic or not hispanic information:
acs.filter_tables('HISPANIC', by='description')
description | columns | |
---|---|---|
table_name | ||
B03001 | HISPANIC OR LATINO ORIGIN BY SPECIFIC ORIGIN | [B03001_001E, B03001_002E, B03001_003E, B03001... |
B03002 | HISPANIC OR LATINO ORIGIN BY RACE | [B03002_001E, B03002_002E, B03002_003E, B03002... |
B03003 | HISPANIC OR LATINO ORIGIN | [B03003_001E, B03003_002E, B03003_003E] |
B16006 | LANGUAGE SPOKEN AT HOME BY ABILITY TO SPEAK EN... | [B16006_001E, B16006_002E, B16006_003E, B16006... |
B98013 | TOTAL POPULATION COVERAGE RATE BY WEIGHTING RA... | [B98013_001E, B98013_002E, B98013_003E, B98013... |
B99031 | ALLOCATION OF HISPANIC OR LATINO ORIGIN | [B99031_001E, B99031_002E, B99031_003E] |
Since we see that B03002
looks like a good table to focus on, we can narrow down the variables we are interested in using filter_variables
:
acs.filter_variables('B03002')
attributes | concept | group | label | limit | predicateOnly | predicateType | required | values | |
---|---|---|---|---|---|---|---|---|---|
B03002_021E | B03002_021M,B03002_021MA,B03002_021EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Hispanic or Latino!!Two or mo... | 0 | NaN | int | NaN | NaN |
B03002_020E | B03002_020M,B03002_020MA,B03002_020EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Hispanic or Latino!!Two or mo... | 0 | NaN | int | NaN | NaN |
B03002_001E | B03002_001M,B03002_001MA,B03002_001EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total | 0 | NaN | int | NaN | NaN |
B03002_005E | B03002_005M,B03002_005MA,B03002_005EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Not Hispanic or Latino!!Ameri... | 0 | NaN | int | NaN | NaN |
B03002_004E | B03002_004M,B03002_004MA,B03002_004EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Not Hispanic or Latino!!Black... | 0 | NaN | int | NaN | NaN |
B03002_003E | B03002_003M,B03002_003MA,B03002_003EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Not Hispanic or Latino!!White... | 0 | NaN | int | NaN | NaN |
B03002_002E | B03002_002M,B03002_002MA,B03002_002EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Not Hispanic or Latino | 0 | NaN | int | NaN | NaN |
B03002_009E | B03002_009M,B03002_009MA,B03002_009EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Not Hispanic or Latino!!Two o... | 0 | NaN | int | NaN | NaN |
B03002_007E | B03002_007M,B03002_007MA,B03002_007EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Not Hispanic or Latino!!Nativ... | 0 | NaN | int | NaN | NaN |
B03002_008E | B03002_008M,B03002_008MA,B03002_008EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Not Hispanic or Latino!!Some ... | 0 | NaN | int | NaN | NaN |
B03002_006E | B03002_006M,B03002_006MA,B03002_006EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Not Hispanic or Latino!!Asian... | 0 | NaN | int | NaN | NaN |
B03002_013E | B03002_013M,B03002_013MA,B03002_013EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Hispanic or Latino!!White alone | 0 | NaN | int | NaN | NaN |
B03002_012E | B03002_012M,B03002_012MA,B03002_012EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Hispanic or Latino | 0 | NaN | int | NaN | NaN |
B03002_011E | B03002_011M,B03002_011MA,B03002_011EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Not Hispanic or Latino!!Two o... | 0 | NaN | int | NaN | NaN |
B03002_010E | B03002_010M,B03002_010MA,B03002_010EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Not Hispanic or Latino!!Two o... | 0 | NaN | int | NaN | NaN |
B03002_017E | B03002_017M,B03002_017MA,B03002_017EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Hispanic or Latino!!Native Ha... | 0 | NaN | int | NaN | NaN |
B03002_016E | B03002_016M,B03002_016MA,B03002_016EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Hispanic or Latino!!Asian alone | 0 | NaN | int | NaN | NaN |
B03002_015E | B03002_015M,B03002_015MA,B03002_015EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Hispanic or Latino!!American ... | 0 | NaN | int | NaN | NaN |
B03002_014E | B03002_014M,B03002_014MA,B03002_014EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Hispanic or Latino!!Black or ... | 0 | NaN | int | NaN | NaN |
B03002_018E | B03002_018M,B03002_018MA,B03002_018EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Hispanic or Latino!!Some othe... | 0 | NaN | int | NaN | NaN |
B03002_019E | B03002_019M,B03002_019MA,B03002_019EA | HISPANIC OR LATINO ORIGIN BY RACE | B03002 | Estimate!!Total!!Hispanic or Latino!!Two or mo... | 0 | NaN | int | NaN | NaN |
There, we see that the relevant columns are those measuring the full population, the hispanic population, and the not hispanic population:
hispanic = ['B03002_001', # full population
'B03002_002', # nonhispanic
'B03002_012' # hispanic
]
Altogether, grabbing the data for a city can be done using the from_place
method:
phoenix = acs.from_place('Phoenix, AZ', variables=hispanic)
/home/lw17329/Dropbox/dev/cenpy/cenpy/geoparser.py:214: UserWarning: Shape is invalid: Ring Self-intersection[-12486597.5213 3939710.1975] tell_user('Shape is invalid: \n{}'.format(vexplain))
Matched: Phoenix, AZ to Phoenix city within layer Incorporated Places
With this, we can use contextily
to grab a basemap:
phoenix_basemap, phoenix_extent = contextily.bounds2img(*phoenix.total_bounds, zoom=10,
url=contextily.tile_providers.ST_TONER_LITE)
And plot the percentage hispanic population:
f,ax = plt.subplots(1,1, figsize=(10,10))
ax.imshow(phoenix_basemap, extent=phoenix_extent, interpolation='sinc')
phoenix['pct_hispanic'] = phoenix.eval('B03002_012E / B03002_001E')
phoenix.plot('pct_hispanic', cmap='plasma', ax = ax, alpha=.2)
<matplotlib.axes._subplots.AxesSubplot at 0x7f9608e8edd8>
To compute segregation in Phoenix for the 2017 five-year ACS, the segregation
package takes the dataframe and column names containing the group under study and the total population. For this, you can estimate the Massey-Denton Dissimilarity statistic using the segregation.aspatial.Dissim
estimator:
seg_phoenix = segregation.aspatial.Dissim(phoenix,
group_pop_var='B03002_012E',
total_pop_var='B03002_001E')
Thus, for 2017, the hispanic/not hispanic dissimilarity index for Phoenix, measured at the census tract level, is:
seg_phoenix.statistic
0.5004851624821972
While this computes the dissimilarity metric, it does not conduct inference on that value. segregation
has a generic testing framework, segregation.inference
, that can estimate and re-estimate segregation indices under certain assumptions. Below, we'll compute the segregation of random maps, assuming populations are randomly distributed across the map.
phx_test = segregation.inference.SingleValueTest(seg_phoenix)
Then, we can plot this to compare the segregation in our random Phoenix maps to the Phoenix we did observe in 2017:
phx_test.plot()
Thus, Phoenix's hispanic/not hispanic dissimilarity value is very different from the values we would expect if populations were randomly distributed across the city.
cenpy
exposes ACSs back to 2013. Thus, we can get the earliest ACS data for Phoenix available from the API using:
phoenix_2013 = cenpy.products.ACS(2013).from_place('Phoenix, AZ', variables=hispanic)
phoenix_2013['pct_hispanic'] = phoenix_2013.eval('B03002_012E / B03002_001E')
/home/lw17329/Dropbox/dev/cenpy/cenpy/geoparser.py:214: UserWarning: Shape is invalid: Ring Self-intersection[-12486597.5213 3939710.1975] tell_user('Shape is invalid: \n{}'.format(vexplain))
Matched: Phoenix, AZ to Phoenix city within layer Incorporated Places
And, we can compare the spatial distributions visually:
f,ax = plt.subplots(1,3, figsize=(20,10), sharex=True, sharey=True)
[ax_.imshow(phoenix_basemap, extent=phoenix_extent, interpolation='sinc') for ax_ in ax]
phoenix.plot('pct_hispanic', cmap='plasma', ax = ax[1], alpha=.4)
phoenix_2013.plot('pct_hispanic', cmap='plasma', ax = ax[0], alpha=.4)
phoenix.merge(phoenix_2013.drop('geometry',axis=1), on='GEOID', suffixes=('_2017', '_2013'))\
.eval('pct_change = (pct_hispanic_2017 - pct_hispanic_2013)/(pct_hispanic_2013)')\
.plot('pct_change', cmap='bwr_r', ax=ax[2], alpha=.4, vmin=-.5, vmax=.5, legend=True)
f.tight_layout()
ax[0].axis(phoenix.total_bounds[[0,2,1,3]])
ax[0].set_title('Hispanic %, 2013', fontsize=20)
ax[1].set_title('Hispanic %, 2017', fontsize=20)
ax[2].set_title('Relative Change', fontsize=20)
Text(0.5, 1.0, 'Relative Change')
To compute the segregation index in 2013, we use the same strategy as before:
seg_phoenix_2013 = segregation.aspatial.Dissim(phoenix_2013,
group_pop_var='B03002_012E',
total_pop_var='B03002_001E')
seg_phoenix_2013.statistic
0.5234336505646645
Now, though, with two statistics (one in 2013 and one in 2017), we can compare the two probabilistically using the segregation.inference.TwoValueTest
:
time_comparison = segregation.inference.TwoValueTest(seg_phoenix, seg_phoenix_2013)
Subjectively, we saw that the statistics were pretty similar. Objectively, the simulation-based inference confirms this intuition. Our estimated difference suggests that the dissimilarity index dropped slightly (from .52 in 2013 to .5 in 2017). But, this drop is within what we'd expect, given the uncertainty in estimating the two segregation indices. The red line is the estimated difference between the two segregation indices, and the blue histogram shows the distribution of simulated differences, which takes into account our uncertainty:
time_comparison.plot()
Cenpy works on any place that's recognized in census
places. If we wanted to compare segregation between different cities, we can do this also with cenpy
& segregation
. For instance, to get Austin, Texas's data from the ACS:
austin = acs.from_place('Austin, TX', variables=hispanic)
/home/lw17329/Dropbox/dev/cenpy/cenpy/geoparser.py:214: UserWarning: Shape is invalid: Ring Self-intersection[-10884881.1468 3554135.7868] tell_user('Shape is invalid: \n{}'.format(vexplain))
Matched: Austin, TX to Austin city within layer Incorporated Places
Just like before, we can get basemaps using contextily
and make nice maps:
austin_basemap, austin_extent = contextily.bounds2img(*austin.total_bounds, zoom=12, url=contextily.tile_providers.ST_TONER_LITE)
f,ax = plt.subplots(1,2, figsize=(10,10))
ax[0].imshow(austin_basemap, extent=austin_extent, interpolation='sinc')
ax[1].imshow(phoenix_basemap, extent=phoenix_extent, interpolation='sinc')
austin.eval('pct_hispanic = B03002_012E / B03002_001E').plot('pct_hispanic', cmap='plasma', ax = ax[0], alpha=.4)
phoenix.plot('pct_hispanic', cmap='plasma', ax=ax[1], alpha=.4)
ax[1].axis(phoenix.total_bounds[[0,2,1,3]])
ax[0].set_title('Hispanic % (Austin)', fontsize=20)
ax[1].set_title('Hispanic % (Phoenix)', fontsize=20)
Text(0.5, 1.0, 'Hispanic % (Phoenix)')
Estimating the difference between segregation in the two cities is difficult. While we can simply compare the raw estimated Dissimilarity score:
seg_austin = segregation.aspatial.Dissim(austin, group_pop_var='B03002_012E', total_pop_var='B03002_001E')
seg_austin.statistic, seg_phoenix.statistic
(0.421753840915938, 0.5004851624821972)
This doesn't take into account the intrinsic uncertainty in that Dissimilarity estimate. Further, it's hard to understand how to compare the two cities, given that they've got distinctive geographical structures. But, the segregation
package takes care of this automatically:
test = segregation.inference.TwoValueTest(seg_austin, seg_phoenix)
test.plot()
Thus, we see that Austin is significantly less segregated that Phoenix, even when acounting for the uncertainty around estimating the Dissimilarity metric.