By Christine Zhang (Knight-Mozilla / Los Angeles Times) & Ryan Menezes (Los Angeles Times)
IRE Conference -- New Orleans, LA
June 18, 2016
This workshop is a basic introduction to R, a free, open-source software for data analysis and statistics.
R is a powerful tool that can help you quickly and effectively answer questions using data.
Take our host city, New Orleans, for example. Hurricane Katrina was a devastating natural disaster that substantially affected the population of New Orleans. The hurricane took place in August 2005, which coincidentally falls between the U.S. Census full population counts in 2000 and 2010.
In this session, we will use the "Demographic Profile” -- a large summary file with many different demographic variables downloaded from the U.S. Census Bureau website -- from 2000 and 2010, for all census tracts in the state of Louisiana.
In this session, we will:
Basic analysis techniques like the ones you will learn in this class can help you write data-driven stories, like this one written by The Times-Picayune shortly after the census released its 2010 tally.
The story begins:
Five years after Hurricane Katrina emptied New Orleans and prompted the largest mass migration in modern American history, the 2010 Census counted 343,829 people living in the still-recovering city, a 29 percent drop since the last head count a decade ago, according to data released today.
Using the data we have, we will attempt to replicate the calculations in that lede.
The following code and annotations were written in a Jupyter notebook. The code is best run in RStudio version 0.99.902 using R version 3.3.0
We'll start by loading in the 2000 data, which is stored in a CSV (comma-separated values) file. CSVs are plain-text files of data where commas separate the columns within a line. It is sometimes preferable to work with CSVs as opposed to files of a proprietary format, such as Microsoft Excel files, but the Census Bureau readily makes data available in both formats.
Let's run R's read.csv
command and save the data to an object called census2000
. Here, we are using assignment with <-
, which tells R to run the right side and assign the result to the object named on the left.
census2000 <- read.csv('2000_census_demographic_profile.csv')
Now that this ran without incident, let's inspect the first few rows using head
, which by default prints out the first six rows of a data frame (R's internal term for a spreadsheet):
head(census2000)
GEO.id | GEO.id2 | GEO.display.label | HC01_VC01 | HC02_VC01 | HC01_VC03 | HC02_VC03 | HC01_VC04 | HC02_VC04 | HC01_VC05 | HC02_VC05 | HC01_VC06 | HC02_VC06 | HC01_VC07 | HC02_VC07 | HC01_VC08 | HC02_VC08 | HC01_VC09 | HC02_VC09 | HC01_VC10 | HC02_VC10 | HC01_VC11 | HC02_VC11 | HC01_VC12 | HC02_VC12 | HC01_VC13 | HC02_VC13 | HC01_VC14 | HC02_VC14 | HC01_VC15 | HC02_VC15 | HC01_VC16 | HC02_VC16 | HC01_VC17 | HC02_VC17 | HC01_VC18 | HC02_VC18 | HC01_VC19 | HC02_VC19 | HC01_VC20 | HC02_VC20 | HC01_VC21 | HC02_VC21 | HC01_VC22 | HC02_VC22 | HC01_VC23 | HC02_VC23 | HC01_VC24 | HC02_VC24 | HC01_VC25 | HC02_VC25 | HC01_VC26 | HC02_VC26 | HC01_VC28 | HC02_VC28 | HC01_VC29 | HC02_VC29 | HC01_VC30 | HC02_VC30 | HC01_VC31 | HC02_VC31 | HC01_VC32 | HC02_VC32 | HC01_VC33 | HC02_VC33 | HC01_VC34 | HC02_VC34 | HC01_VC35 | HC02_VC35 | HC01_VC36 | HC02_VC36 | HC01_VC37 | HC02_VC37 | HC01_VC38 | HC02_VC38 | HC01_VC39 | HC02_VC39 | HC01_VC40 | HC02_VC40 | HC01_VC41 | HC02_VC41 | HC01_VC42 | HC02_VC42 | HC01_VC43 | HC02_VC43 | HC01_VC44 | HC02_VC44 | HC01_VC45 | HC02_VC45 | HC01_VC46 | HC02_VC46 | HC01_VC48 | HC02_VC48 | HC01_VC49 | HC02_VC49 | HC01_VC50 | HC02_VC50 | HC01_VC51 | HC02_VC51 | HC01_VC52 | HC02_VC52 | HC01_VC53 | HC02_VC53 | HC01_VC55 | HC02_VC55 | HC01_VC56 | HC02_VC56 | HC01_VC57 | HC02_VC57 | HC01_VC58 | HC02_VC58 | HC01_VC59 | HC02_VC59 | HC01_VC60 | HC02_VC60 | HC01_VC61 | HC02_VC61 | HC01_VC62 | HC02_VC62 | HC01_VC64 | HC02_VC64 | HC01_VC65 | HC02_VC65 | HC01_VC66 | HC02_VC66 | HC01_VC67 | HC02_VC67 | HC01_VC68 | HC02_VC68 | HC01_VC69 | HC02_VC69 | HC01_VC70 | HC02_VC70 | HC01_VC71 | HC02_VC71 | HC01_VC72 | HC02_VC72 | HC01_VC73 | HC02_VC73 | HC01_VC74 | HC02_VC74 | HC01_VC75 | HC02_VC75 | HC01_VC76 | HC02_VC76 | HC01_VC78 | HC02_VC78 | HC01_VC79 | HC02_VC79 | HC01_VC80 | HC02_VC80 | HC01_VC81 | HC02_VC81 | HC01_VC82 | HC02_VC82 | HC01_VC83 | HC02_VC83 | HC01_VC84 | HC02_VC84 | HC01_VC85 | HC02_VC85 | HC01_VC86 | HC02_VC86 | HC01_VC87 | HC02_VC87 | HC01_VC88 | HC02_VC88 | HC01_VC89 | HC02_VC89 | HC01_VC90 | HC02_VC90 | HC01_VC91 | HC02_VC91 | HC01_VC93 | HC02_VC93 | HC01_VC94 | HC02_VC94 | HC01_VC95 | HC02_VC95 | HC01_VC96 | HC02_VC96 | HC01_VC97 | HC02_VC97 | HC01_VC98 | HC02_VC98 | HC01_VC100 | HC02_VC100 | HC01_VC101 | HC02_VC101 | HC01_VC102 | HC02_VC102 | HC01_VC103 | HC02_VC103 | HC01_VC104 | HC02_VC104 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Id | Id2 | Geography | Number; Total population | Percent; Total population | Number; Total population - SEX AND AGE - Male | Percent; Total population - SEX AND AGE - Male | Number; Total population - SEX AND AGE - Female | Percent; Total population - SEX AND AGE - Female | Number; Total population - SEX AND AGE - Under 5 years | Percent; Total population - SEX AND AGE - Under 5 years | Number; Total population - SEX AND AGE - 5 to 9 years | Percent; Total population - SEX AND AGE - 5 to 9 years | Number; Total population - SEX AND AGE - 10 to 14 years | Percent; Total population - SEX AND AGE - 10 to 14 years | Number; Total population - SEX AND AGE - 15 to 19 years | Percent; Total population - SEX AND AGE - 15 to 19 years | Number; Total population - SEX AND AGE - 20 to 24 years | Percent; Total population - SEX AND AGE - 20 to 24 years | Number; Total population - SEX AND AGE - 25 to 34 years | Percent; Total population - SEX AND AGE - 25 to 34 years | Number; Total population - SEX AND AGE - 35 to 44 years | Percent; Total population - SEX AND AGE - 35 to 44 years | Number; Total population - SEX AND AGE - 45 to 54 years | Percent; Total population - SEX AND AGE - 45 to 54 years | Number; Total population - SEX AND AGE - 55 to 59 years | Percent; Total population - SEX AND AGE - 55 to 59 years | Number; Total population - SEX AND AGE - 60 to 64 years | Percent; Total population - SEX AND AGE - 60 to 64 years | Number; Total population - SEX AND AGE - 65 to 74 years | Percent; Total population - SEX AND AGE - 65 to 74 years | Number; Total population - SEX AND AGE - 75 to 84 years | Percent; Total population - SEX AND AGE - 75 to 84 years | Number; Total population - SEX AND AGE - 85 years and over | Percent; Total population - SEX AND AGE - 85 years and over | Number; Total population - SEX AND AGE - Median age (years) | Percent; Total population - SEX AND AGE - Median age (years) | Number; Total population - SEX AND AGE - 18 years and over | Percent; Total population - SEX AND AGE - 18 years and over | Number; Total population - SEX AND AGE - 18 years and over - Male | Percent; Total population - SEX AND AGE - 18 years and over - Male | Number; Total population - SEX AND AGE - 18 years and over - Female | Percent; Total population - SEX AND AGE - 18 years and over - Female | Number; Total population - SEX AND AGE - 21 years and over | Percent; Total population - SEX AND AGE - 21 years and over | Number; Total population - SEX AND AGE - 62 years and over | Percent; Total population - SEX AND AGE - 62 years and over | Number; Total population - SEX AND AGE - 65 years and over | Percent; Total population - SEX AND AGE - 65 years and over | Number; Total population - SEX AND AGE - 65 years and over - Male | Percent; Total population - SEX AND AGE - 65 years and over - Male | Number; Total population - SEX AND AGE - 65 years and over - Female | Percent; Total population - SEX AND AGE - 65 years and over - Female | Number; Total population - RACE - One race | Percent; Total population - RACE - One race | Number; Total population - RACE - One race - White | Percent; Total population - RACE - One race - White | Number; Total population - RACE - One race - Black or African American | Percent; Total population - RACE - One race - Black or African American | Number; Total population - RACE - One race - American Indian and Alaska Native | Percent; Total population - RACE - One race - American Indian and Alaska Native | Number; Total population - RACE - One race - Asian | Percent; Total population - RACE - One race - Asian | Number; Total population - RACE - One race - Asian - Asian Indian | Percent; Total population - RACE - One race - Asian - Asian Indian | Number; Total population - RACE - One race - Asian - Chinese | Percent; Total population - RACE - One race - Asian - Chinese | Number; Total population - RACE - One race - Asian - Filipino | Percent; Total population - RACE - One race - Asian - Filipino | Number; Total population - RACE - One race - Asian - Japanese | Percent; Total population - RACE - One race - Asian - Japanese | Number; Total population - RACE - One race - Asian - Korean | Percent; Total population - RACE - One race - Asian - Korean | Number; Total population - RACE - One race - Asian - Vietnamese | Percent; Total population - RACE - One race - Asian - Vietnamese | Number; Total population - RACE - One race - Asian - Other Asian [1] | Percent; Total population - RACE - One race - Asian - Other Asian [1] | Number; Total population - RACE - One race - Native Hawaiian and Other Pacific Islander | Percent; Total population - RACE - One race - Native Hawaiian and Other Pacific Islander | Number; Total population - RACE - One race - Native Hawaiian and Other Pacific Islander - Native Hawaiian | Percent; Total population - RACE - One race - Native Hawaiian and Other Pacific Islander - Native Hawaiian | Number; Total population - RACE - One race - Native Hawaiian and Other Pacific Islander - Guamanian or Chamorro | Percent; Total population - RACE - One race - Native Hawaiian and Other Pacific Islander - Guamanian or Chamorro | Number; Total population - RACE - One race - Native Hawaiian and Other Pacific Islander - Samoan | Percent; Total population - RACE - One race - Native Hawaiian and Other Pacific Islander - Samoan | Number; Total population - RACE - One race - Native Hawaiian and Other Pacific Islander - Other Pacific Islander [2] | Percent; Total population - RACE - One race - Native Hawaiian and Other Pacific Islander - Other Pacific Islander [2] | Number; Total population - RACE - One race - Some other race | Percent; Total population - RACE - One race - Some other race | Number; Total population - RACE - Two or more races | Percent; Total population - RACE - Two or more races | Number; Total population - RACE - Race alone or in combination with one or more other races [3] - White | Percent; Total population - RACE - Race alone or in combination with one or more other races [3] - White | Number; Total population - RACE - Race alone or in combination with one or more other races [3] - Black or African American | Percent; Total population - RACE - Race alone or in combination with one or more other races [3] - Black or African American | Number; Total population - RACE - Race alone or in combination with one or more other races [3] - American Indian and Alaska Native | Percent; Total population - RACE - Race alone or in combination with one or more other races [3] - American Indian and Alaska Native | Number; Total population - RACE - Race alone or in combination with one or more other races [3] - Asian | Percent; Total population - RACE - Race alone or in combination with one or more other races [3] - Asian | Number; Total population - RACE - Race alone or in combination with one or more other races [3] - Native Hawaiian and Other Pacific Islander | Percent; Total population - RACE - Race alone or in combination with one or more other races [3] - Native Hawaiian and Other Pacific Islander | Number; Total population - RACE - Race alone or in combination with one or more other races [3] - Some other race | Percent; Total population - RACE - Race alone or in combination with one or more other races [3] - Some other race | Number; HISPANIC OR LATINO AND RACE - Total population | Percent; HISPANIC OR LATINO AND RACE - Total population | Number; HISPANIC OR LATINO AND RACE - Total population - Hispanic or Latino (of any race) | Percent; HISPANIC OR LATINO AND RACE - Total population - Hispanic or Latino (of any race) | Number; HISPANIC OR LATINO AND RACE - Total population - Hispanic or Latino (of any race) - Mexican | Percent; HISPANIC OR LATINO AND RACE - Total population - Hispanic or Latino (of any race) - Mexican | Number; HISPANIC OR LATINO AND RACE - Total population - Hispanic or Latino (of any race) - Puerto Rican | Percent; HISPANIC OR LATINO AND RACE - Total population - Hispanic or Latino (of any race) - Puerto Rican | Number; HISPANIC OR LATINO AND RACE - Total population - Hispanic or Latino (of any race) - Cuban | Percent; HISPANIC OR LATINO AND RACE - Total population - Hispanic or Latino (of any race) - Cuban | Number; HISPANIC OR LATINO AND RACE - Total population - Hispanic or Latino (of any race) - Other Hispanic or Latino | Percent; HISPANIC OR LATINO AND RACE - Total population - Hispanic or Latino (of any race) - Other Hispanic or Latino | Number; HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino | Percent; HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino | Number; HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino - White alone | Percent; HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino - White alone | Number; RELATIONSHIP - Total population | Percent; RELATIONSHIP - Total population | Number; RELATIONSHIP - Total population - In households | Percent; RELATIONSHIP - Total population - In households | Number; RELATIONSHIP - Total population - In households - Householder | Percent; RELATIONSHIP - Total population - In households - Householder | Number; RELATIONSHIP - Total population - In households - Spouse | Percent; RELATIONSHIP - Total population - In households - Spouse | Number; RELATIONSHIP - Total population - In households - Child | Percent; RELATIONSHIP - Total population - In households - Child | Number; RELATIONSHIP - Total population - In households - Child - Own child under 18 years | Percent; RELATIONSHIP - Total population - In households - Child - Own child under 18 years | Number; RELATIONSHIP - Total population - In households - Other relatives | Percent; RELATIONSHIP - Total population - In households - Other relatives | Number; RELATIONSHIP - Total population - In households - Other relatives - Under 18 years | Percent; RELATIONSHIP - Total population - In households - Other relatives - Under 18 years | Number; RELATIONSHIP - Total population - In households - Nonrelatives | Percent; RELATIONSHIP - Total population - In households - Nonrelatives | Number; RELATIONSHIP - Total population - In households - Nonrelatives - Unmarried partner | Percent; RELATIONSHIP - Total population - In households - Nonrelatives - Unmarried partner | Number; RELATIONSHIP - Total population - In group quarters | Percent; RELATIONSHIP - Total population - In group quarters | Number; RELATIONSHIP - Total population - In group quarters - Institutionalized population | Percent; RELATIONSHIP - Total population - In group quarters - Institutionalized population | Number; RELATIONSHIP - Total population - In group quarters - Noninstitutionalized population | Percent; RELATIONSHIP - Total population - In group quarters - Noninstitutionalized population | Number; HOUSEHOLDS BY TYPE - Total households | Percent; HOUSEHOLDS BY TYPE - Total households | Number; HOUSEHOLDS BY TYPE - Total households - Family households (families) | Percent; HOUSEHOLDS BY TYPE - Total households - Family households (families) | Number; HOUSEHOLDS BY TYPE - Total households - Family households (families) - With own children under 18 years | Percent; HOUSEHOLDS BY TYPE - Total households - Family households (families) - With own children under 18 years | Number; HOUSEHOLDS BY TYPE - Total households - Family households (families) - Married-couple family | Percent; HOUSEHOLDS BY TYPE - Total households - Family households (families) - Married-couple family | Number; HOUSEHOLDS BY TYPE - Total households - Family households (families) - Married-couple family - With own children under 18 years | Percent; HOUSEHOLDS BY TYPE - Total households - Family households (families) - Married-couple family - With own children under 18 years | Number; HOUSEHOLDS BY TYPE - Total households - Family households (families) - Female householder, no husband present | Percent; HOUSEHOLDS BY TYPE - Total households - Family households (families) - Female householder, no husband present | Number; HOUSEHOLDS BY TYPE - Total households - Family households (families) - Female householder, no husband present - With own children under 18 years | Percent; HOUSEHOLDS BY TYPE - Total households - Family households (families) - Female householder, no husband present - With own children under 18 years | Number; HOUSEHOLDS BY TYPE - Total households - Nonfamily households | Percent; HOUSEHOLDS BY TYPE - Total households - Nonfamily households | Number; HOUSEHOLDS BY TYPE - Total households - Nonfamily households - Householder living alone | Percent; HOUSEHOLDS BY TYPE - Total households - Nonfamily households - Householder living alone | Number; HOUSEHOLDS BY TYPE - Total households - Nonfamily households - Householder living alone - Householder 65 years and over | Percent; HOUSEHOLDS BY TYPE - Total households - Nonfamily households - Householder living alone - Householder 65 years and over | Number; HOUSEHOLDS BY TYPE - Total households - Households with individuals under 18 years | Percent; HOUSEHOLDS BY TYPE - Total households - Households with individuals under 18 years | Number; HOUSEHOLDS BY TYPE - Total households - Households with individuals 65 years and over | Percent; HOUSEHOLDS BY TYPE - Total households - Households with individuals 65 years and over | Number; HOUSEHOLDS BY TYPE - Total households - Average household size | Percent; HOUSEHOLDS BY TYPE - Total households - Average household size | Number; HOUSEHOLDS BY TYPE - Total households - Average family size | Percent; HOUSEHOLDS BY TYPE - Total households - Average family size | Number; HOUSING OCCUPANCY - Total housing units | Percent; HOUSING OCCUPANCY - Total housing units | Number; HOUSING OCCUPANCY - Total housing units - Occupied housing units | Percent; HOUSING OCCUPANCY - Total housing units - Occupied housing units | Number; HOUSING OCCUPANCY - Total housing units - Vacant housing units | Percent; HOUSING OCCUPANCY - Total housing units - Vacant housing units | Number; HOUSING OCCUPANCY - Total housing units - Vacant housing units - For seasonal, recreational, or occasional use | Percent; HOUSING OCCUPANCY - Total housing units - Vacant housing units - For seasonal, recreational, or occasional use | Number; HOUSING OCCUPANCY - Total housing units - Homeowner vacancy rate (percent) | Percent; HOUSING OCCUPANCY - Total housing units - Homeowner vacancy rate (percent) | Number; HOUSING OCCUPANCY - Total housing units - Rental vacancy rate (percent) | Percent; HOUSING OCCUPANCY - Total housing units - Rental vacancy rate (percent) | Number; HOUSING TENURE - Occupied housing units | Percent; HOUSING TENURE - Occupied housing units | Number; HOUSING TENURE - Occupied housing units - Owner-occupied housing units | Percent; HOUSING TENURE - Occupied housing units - Owner-occupied housing units | Number; HOUSING TENURE - Occupied housing units - Renter-occupied housing units | Percent; HOUSING TENURE - Occupied housing units - Renter-occupied housing units | Number; HOUSING TENURE - Occupied housing units - Average household size of owner-occupied unit | Percent; HOUSING TENURE - Occupied housing units - Average household size of owner-occupied unit | Number; HOUSING TENURE - Occupied housing units - Average household size of renter-occupied unit | Percent; HOUSING TENURE - Occupied housing units - Average household size of renter-occupied unit |
2 | 1400000US22001960100 | 22001960100 | Census Tract 9601, Acadia Parish, Louisiana | 6,188 | 100 | 2,920 | 47 | 3,268 | 53 | 462 | 8 | 502 | 8 | 541 | 9 | 572 | 9 | 375 | 6 | 728 | 12 | 913 | 15 | 699 | 11 | 301 | 5 | 252 | 4 | 433 | 7 | 287 | 5 | 123 | 2 | 34 | (X) | 4,304 | 70 | 1,957 | 32 | 2,347 | 38 | 4,031 | 65 | 996 | 16 | 843 | 14 | 295 | 5 | 548 | 9 | 6,174 | 100 | 4,455 | 72 | 1,675 | 27 | 12 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 0 | 14 | 0 | 4,468 | 72 | 1,677 | 27 | 19 | 0 | 8 | 0 | 0 | 0 | 30 | 1 | 6,188 | 100 | 87 | 1 | 51 | 1 | 1 | 0 | 0 | 0 | 35 | 1 | 6,101 | 99 | 4,398 | 71 | 6,188 | 100 | 6,030 | 97 | 2,236 | 36 | 1,119 | 18 | 2,199 | 36 | 1,700 | 28 | 274 | 4 | 143 | 2 | 202 | 3 | 109 | 2 | 158 | 3 | 151 | 2 | 7 | 0 | 2,236 | 100 | 1,595 | 71 | 868 | 39 | 1,119 | 50 | 573 | 26 | 363 | 16 | 237 | 11 | 641 | 29 | 585 | 26 | 303 | 14 | 962 | 43 | 569 | 25 | 3 | (X) | 3 | (X) | 2,410 | 100 | 2,236 | 93 | 174 | 7 | 15 | 1 | 1 | (X) | 8 | (X) | 2,236 | 100 | 1,526 | 68 | 710 | 32 | 3 | (X) | 3 | (X) |
3 | 1400000US22001960200 | 22001960200 | Census Tract 9602, Acadia Parish, Louisiana | 5,056 | 100 | 2,562 | 51 | 2,494 | 49 | 346 | 7 | 416 | 8 | 476 | 9 | 463 | 9 | 298 | 6 | 579 | 12 | 861 | 17 | 709 | 14 | 250 | 5 | 203 | 4 | 263 | 5 | 150 | 3 | 42 | 1 | 34 | (X) | 3,527 | 70 | 1,758 | 35 | 1,769 | 35 | 3,289 | 65 | 570 | 11 | 455 | 9 | 217 | 4 | 238 | 5 | 5,035 | 100 | 4,799 | 95 | 216 | 4 | 6 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 0 | 21 | 0 | 4,816 | 95 | 226 | 5 | 13 | 0 | 6 | 0 | 4 | 0 | 16 | 0 | 5,056 | 100 | 35 | 1 | 18 | 0 | 0 | 0 | 0 | 0 | 17 | 0 | 5,021 | 99 | 4,775 | 94 | 5,056 | 100 | 5,056 | 100 | 1,764 | 35 | 1,216 | 24 | 1,791 | 35 | 1,413 | 28 | 173 | 3 | 86 | 2 | 112 | 2 | 61 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1,764 | 100 | 1,408 | 80 | 722 | 41 | 1,216 | 69 | 617 | 35 | 134 | 8 | 81 | 5 | 356 | 20 | 310 | 18 | 128 | 7 | 781 | 44 | 339 | 19 | 3 | (X) | 3 | (X) | 1,909 | 100 | 1,764 | 92 | 145 | 8 | 31 | 2 | 1 | (X) | 7 | (X) | 1,764 | 100 | 1,461 | 83 | 303 | 17 | 3 | (X) | 3 | (X) |
4 | 1400000US22001960300 | 22001960300 | Census Tract 9603, Acadia Parish, Louisiana | 3,149 | 100 | 1,593 | 51 | 1,556 | 49 | 209 | 7 | 251 | 8 | 305 | 10 | 260 | 8 | 204 | 7 | 368 | 12 | 520 | 17 | 409 | 13 | 148 | 5 | 130 | 4 | 209 | 7 | 104 | 3 | 32 | 1 | 35 | (X) | 2,233 | 71 | 1,103 | 35 | 1,130 | 36 | 2,081 | 66 | 435 | 14 | 345 | 11 | 150 | 5 | 195 | 6 | 3,140 | 100 | 3,058 | 97 | 67 | 2 | 8 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 9 | 0 | 3,066 | 97 | 69 | 2 | 13 | 0 | 2 | 0 | 1 | 0 | 7 | 0 | 3,149 | 100 | 15 | 1 | 4 | 0 | 0 | 0 | 0 | 0 | 11 | 0 | 3,134 | 100 | 3,049 | 97 | 3,149 | 100 | 3,148 | 100 | 1,145 | 36 | 750 | 24 | 1,091 | 35 | 854 | 27 | 73 | 2 | 38 | 1 | 89 | 3 | 48 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | 1,145 | 100 | 883 | 77 | 445 | 39 | 750 | 66 | 369 | 32 | 93 | 8 | 52 | 5 | 262 | 23 | 228 | 20 | 91 | 8 | 475 | 42 | 247 | 22 | 3 | (X) | 3 | (X) | 1,246 | 100 | 1,145 | 92 | 101 | 8 | 19 | 2 | 1 | (X) | 7 | (X) | 1,145 | 100 | 1,041 | 91 | 104 | 9 | 3 | (X) | 3 | (X) |
5 | 1400000US22001960400 | 22001960400 | Census Tract 9604, Acadia Parish, Louisiana | 5,617 | 100 | 2,754 | 49 | 2,863 | 51 | 429 | 8 | 406 | 7 | 520 | 9 | 476 | 9 | 353 | 6 | 691 | 12 | 914 | 16 | 684 | 12 | 254 | 5 | 222 | 4 | 410 | 7 | 193 | 3 | 65 | 1 | 34 | (X) | 3,944 | 70 | 1,911 | 34 | 2,033 | 36 | 3,716 | 66 | 800 | 14 | 668 | 12 | 302 | 5 | 366 | 7 | 5,583 | 99 | 5,347 | 95 | 207 | 4 | 18 | 0 | 6 | 0 | 1 | 0 | 1 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 34 | 1 | 5,381 | 96 | 211 | 4 | 43 | 1 | 8 | 0 | 0 | 0 | 9 | 0 | 5,617 | 100 | 43 | 1 | 24 | 0 | 0 | 0 | 0 | 0 | 19 | 0 | 5,574 | 99 | 5,307 | 95 | 5,617 | 100 | 5,592 | 100 | 1,991 | 35 | 1,291 | 23 | 1,994 | 36 | 1,554 | 28 | 196 | 4 | 101 | 2 | 120 | 2 | 70 | 1 | 25 | 0 | 10 | 0 | 15 | 0 | 1,991 | 100 | 1,555 | 78 | 804 | 40 | 1,291 | 65 | 641 | 32 | 168 | 8 | 99 | 5 | 436 | 22 | 388 | 20 | 189 | 10 | 861 | 43 | 482 | 24 | 3 | (X) | 3 | (X) | 2,176 | 100 | 1,991 | 92 | 185 | 9 | 23 | 1 | 1 | (X) | 6 | (X) | 1,991 | 100 | 1,630 | 82 | 361 | 18 | 3 | (X) | 3 | (X) |
6 | 1400000US22001960500 | 22001960500 | Census Tract 9605, Acadia Parish, Louisiana | 4,927 | 100 | 2,461 | 50 | 2,466 | 50 | 400 | 8 | 438 | 9 | 439 | 9 | 418 | 9 | 319 | 7 | 704 | 14 | 777 | 16 | 644 | 13 | 227 | 5 | 154 | 3 | 234 | 5 | 134 | 3 | 39 | 1 | 32 | (X) | 3,405 | 69 | 1,675 | 34 | 1,730 | 35 | 3,162 | 64 | 499 | 10 | 407 | 8 | 167 | 3 | 240 | 5 | 4,901 | 100 | 4,498 | 91 | 378 | 8 | 15 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 5 | 0 | 26 | 1 | 4,524 | 92 | 385 | 8 | 35 | 1 | 4 | 0 | 1 | 0 | 9 | 0 | 4,927 | 100 | 61 | 1 | 37 | 1 | 1 | 0 | 0 | 0 | 23 | 1 | 4,866 | 99 | 4,448 | 90 | 4,927 | 100 | 4,921 | 100 | 1,692 | 34 | 1,068 | 22 | 1,809 | 37 | 1,418 | 29 | 174 | 4 | 71 | 1 | 178 | 4 | 100 | 2 | 6 | 0 | 0 | 0 | 6 | 0 | 1,692 | 100 | 1,326 | 78 | 762 | 45 | 1,068 | 63 | 611 | 36 | 182 | 11 | 103 | 6 | 366 | 22 | 300 | 18 | 136 | 8 | 808 | 48 | 316 | 19 | 3 | (X) | 3 | (X) | 1,796 | 100 | 1,692 | 94 | 104 | 6 | 22 | 1 | 1 | (X) | 5 | (X) | 1,692 | 100 | 1,419 | 84 | 273 | 16 | 3 | (X) | 3 | (X) |
Upon inspection, we can see that the file came with two header rows. R, by default, takes the first row of a CSV to be the header. We clearly do not need the first one so we can rerun the read.csv
command and tell it so:
census2000 <- read.csv('2000_census_demographic_profile.csv', skip = 1)
head(census2000)
Id | Id2 | Geography | Number..Total.population | Percent..Total.population | Number..Total.population...SEX.AND.AGE...Male | Percent..Total.population...SEX.AND.AGE...Male | Number..Total.population...SEX.AND.AGE...Female | Percent..Total.population...SEX.AND.AGE...Female | Number..Total.population...SEX.AND.AGE...Under.5.years | Percent..Total.population...SEX.AND.AGE...Under.5.years | Number..Total.population...SEX.AND.AGE...5.to.9.years | Percent..Total.population...SEX.AND.AGE...5.to.9.years | Number..Total.population...SEX.AND.AGE...10.to.14.years | Percent..Total.population...SEX.AND.AGE...10.to.14.years | Number..Total.population...SEX.AND.AGE...15.to.19.years | Percent..Total.population...SEX.AND.AGE...15.to.19.years | Number..Total.population...SEX.AND.AGE...20.to.24.years | Percent..Total.population...SEX.AND.AGE...20.to.24.years | Number..Total.population...SEX.AND.AGE...25.to.34.years | Percent..Total.population...SEX.AND.AGE...25.to.34.years | Number..Total.population...SEX.AND.AGE...35.to.44.years | Percent..Total.population...SEX.AND.AGE...35.to.44.years | Number..Total.population...SEX.AND.AGE...45.to.54.years | Percent..Total.population...SEX.AND.AGE...45.to.54.years | Number..Total.population...SEX.AND.AGE...55.to.59.years | Percent..Total.population...SEX.AND.AGE...55.to.59.years | Number..Total.population...SEX.AND.AGE...60.to.64.years | Percent..Total.population...SEX.AND.AGE...60.to.64.years | Number..Total.population...SEX.AND.AGE...65.to.74.years | Percent..Total.population...SEX.AND.AGE...65.to.74.years | Number..Total.population...SEX.AND.AGE...75.to.84.years | Percent..Total.population...SEX.AND.AGE...75.to.84.years | Number..Total.population...SEX.AND.AGE...85.years.and.over | Percent..Total.population...SEX.AND.AGE...85.years.and.over | Number..Total.population...SEX.AND.AGE...Median.age..years. | Percent..Total.population...SEX.AND.AGE...Median.age..years. | Number..Total.population...SEX.AND.AGE...18.years.and.over | Percent..Total.population...SEX.AND.AGE...18.years.and.over | Number..Total.population...SEX.AND.AGE...18.years.and.over...Male | Percent..Total.population...SEX.AND.AGE...18.years.and.over...Male | Number..Total.population...SEX.AND.AGE...18.years.and.over...Female | Percent..Total.population...SEX.AND.AGE...18.years.and.over...Female | Number..Total.population...SEX.AND.AGE...21.years.and.over | Percent..Total.population...SEX.AND.AGE...21.years.and.over | Number..Total.population...SEX.AND.AGE...62.years.and.over | Percent..Total.population...SEX.AND.AGE...62.years.and.over | Number..Total.population...SEX.AND.AGE...65.years.and.over | Percent..Total.population...SEX.AND.AGE...65.years.and.over | Number..Total.population...SEX.AND.AGE...65.years.and.over...Male | Percent..Total.population...SEX.AND.AGE...65.years.and.over...Male | Number..Total.population...SEX.AND.AGE...65.years.and.over...Female | Percent..Total.population...SEX.AND.AGE...65.years.and.over...Female | Number..Total.population...RACE...One.race | Percent..Total.population...RACE...One.race | Number..Total.population...RACE...One.race...White | Percent..Total.population...RACE...One.race...White | Number..Total.population...RACE...One.race...Black.or.African.American | Percent..Total.population...RACE...One.race...Black.or.African.American | Number..Total.population...RACE...One.race...American.Indian.and.Alaska.Native | Percent..Total.population...RACE...One.race...American.Indian.and.Alaska.Native | Number..Total.population...RACE...One.race...Asian | Percent..Total.population...RACE...One.race...Asian | Number..Total.population...RACE...One.race...Asian...Asian.Indian | Percent..Total.population...RACE...One.race...Asian...Asian.Indian | Number..Total.population...RACE...One.race...Asian...Chinese | Percent..Total.population...RACE...One.race...Asian...Chinese | Number..Total.population...RACE...One.race...Asian...Filipino | Percent..Total.population...RACE...One.race...Asian...Filipino | Number..Total.population...RACE...One.race...Asian...Japanese | Percent..Total.population...RACE...One.race...Asian...Japanese | Number..Total.population...RACE...One.race...Asian...Korean | Percent..Total.population...RACE...One.race...Asian...Korean | Number..Total.population...RACE...One.race...Asian...Vietnamese | Percent..Total.population...RACE...One.race...Asian...Vietnamese | Number..Total.population...RACE...One.race...Asian...Other.Asian..1. | Percent..Total.population...RACE...One.race...Asian...Other.Asian..1. | Number..Total.population...RACE...One.race...Native.Hawaiian.and.Other.Pacific.Islander | Percent..Total.population...RACE...One.race...Native.Hawaiian.and.Other.Pacific.Islander | Number..Total.population...RACE...One.race...Native.Hawaiian.and.Other.Pacific.Islander...Native.Hawaiian | Percent..Total.population...RACE...One.race...Native.Hawaiian.and.Other.Pacific.Islander...Native.Hawaiian | Number..Total.population...RACE...One.race...Native.Hawaiian.and.Other.Pacific.Islander...Guamanian.or.Chamorro | Percent..Total.population...RACE...One.race...Native.Hawaiian.and.Other.Pacific.Islander...Guamanian.or.Chamorro | Number..Total.population...RACE...One.race...Native.Hawaiian.and.Other.Pacific.Islander...Samoan | Percent..Total.population...RACE...One.race...Native.Hawaiian.and.Other.Pacific.Islander...Samoan | Number..Total.population...RACE...One.race...Native.Hawaiian.and.Other.Pacific.Islander...Other.Pacific.Islander..2. | Percent..Total.population...RACE...One.race...Native.Hawaiian.and.Other.Pacific.Islander...Other.Pacific.Islander..2. | Number..Total.population...RACE...One.race...Some.other.race | Percent..Total.population...RACE...One.race...Some.other.race | Number..Total.population...RACE...Two.or.more.races | Percent..Total.population...RACE...Two.or.more.races | Number..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....White | Percent..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....White | Number..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....Black.or.African.American | Percent..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....Black.or.African.American | Number..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....American.Indian.and.Alaska.Native | Percent..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....American.Indian.and.Alaska.Native | Number..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....Asian | Percent..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....Asian | Number..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....Native.Hawaiian.and.Other.Pacific.Islander | Percent..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....Native.Hawaiian.and.Other.Pacific.Islander | Number..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....Some.other.race | Percent..Total.population...RACE...Race.alone.or.in.combination.with.one.or.more.other.races..3....Some.other.race | Number..HISPANIC.OR.LATINO.AND.RACE...Total.population | Percent..HISPANIC.OR.LATINO.AND.RACE...Total.population | Number..HISPANIC.OR.LATINO.AND.RACE...Total.population...Hispanic.or.Latino..of.any.race. | Percent..HISPANIC.OR.LATINO.AND.RACE...Total.population...Hispanic.or.Latino..of.any.race. | Number..HISPANIC.OR.LATINO.AND.RACE...Total.population...Hispanic.or.Latino..of.any.race....Mexican | Percent..HISPANIC.OR.LATINO.AND.RACE...Total.population...Hispanic.or.Latino..of.any.race....Mexican | Number..HISPANIC.OR.LATINO.AND.RACE...Total.population...Hispanic.or.Latino..of.any.race....Puerto.Rican | Percent..HISPANIC.OR.LATINO.AND.RACE...Total.population...Hispanic.or.Latino..of.any.race....Puerto.Rican | Number..HISPANIC.OR.LATINO.AND.RACE...Total.population...Hispanic.or.Latino..of.any.race....Cuban | Percent..HISPANIC.OR.LATINO.AND.RACE...Total.population...Hispanic.or.Latino..of.any.race....Cuban | Number..HISPANIC.OR.LATINO.AND.RACE...Total.population...Hispanic.or.Latino..of.any.race....Other.Hispanic.or.Latino | Percent..HISPANIC.OR.LATINO.AND.RACE...Total.population...Hispanic.or.Latino..of.any.race....Other.Hispanic.or.Latino | Number..HISPANIC.OR.LATINO.AND.RACE...Total.population...Not.Hispanic.or.Latino | Percent..HISPANIC.OR.LATINO.AND.RACE...Total.population...Not.Hispanic.or.Latino | Number..HISPANIC.OR.LATINO.AND.RACE...Total.population...Not.Hispanic.or.Latino...White.alone | Percent..HISPANIC.OR.LATINO.AND.RACE...Total.population...Not.Hispanic.or.Latino...White.alone | Number..RELATIONSHIP...Total.population | Percent..RELATIONSHIP...Total.population | Number..RELATIONSHIP...Total.population...In.households | Percent..RELATIONSHIP...Total.population...In.households | Number..RELATIONSHIP...Total.population...In.households...Householder | Percent..RELATIONSHIP...Total.population...In.households...Householder | Number..RELATIONSHIP...Total.population...In.households...Spouse | Percent..RELATIONSHIP...Total.population...In.households...Spouse | Number..RELATIONSHIP...Total.population...In.households...Child | Percent..RELATIONSHIP...Total.population...In.households...Child | Number..RELATIONSHIP...Total.population...In.households...Child...Own.child.under.18.years | Percent..RELATIONSHIP...Total.population...In.households...Child...Own.child.under.18.years | Number..RELATIONSHIP...Total.population...In.households...Other.relatives | Percent..RELATIONSHIP...Total.population...In.households...Other.relatives | Number..RELATIONSHIP...Total.population...In.households...Other.relatives...Under.18.years | Percent..RELATIONSHIP...Total.population...In.households...Other.relatives...Under.18.years | Number..RELATIONSHIP...Total.population...In.households...Nonrelatives | Percent..RELATIONSHIP...Total.population...In.households...Nonrelatives | Number..RELATIONSHIP...Total.population...In.households...Nonrelatives...Unmarried.partner | Percent..RELATIONSHIP...Total.population...In.households...Nonrelatives...Unmarried.partner | Number..RELATIONSHIP...Total.population...In.group.quarters | Percent..RELATIONSHIP...Total.population...In.group.quarters | Number..RELATIONSHIP...Total.population...In.group.quarters...Institutionalized.population | Percent..RELATIONSHIP...Total.population...In.group.quarters...Institutionalized.population | Number..RELATIONSHIP...Total.population...In.group.quarters...Noninstitutionalized.population | Percent..RELATIONSHIP...Total.population...In.group.quarters...Noninstitutionalized.population | Number..HOUSEHOLDS.BY.TYPE...Total.households | Percent..HOUSEHOLDS.BY.TYPE...Total.households | Number..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families. | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families. | Number..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families....With.own.children.under.18.years | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families....With.own.children.under.18.years | Number..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families....Married.couple.family | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families....Married.couple.family | Number..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families....Married.couple.family...With.own.children.under.18.years | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families....Married.couple.family...With.own.children.under.18.years | Number..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families....Female.householder..no.husband.present | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families....Female.householder..no.husband.present | Number..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families....Female.householder..no.husband.present...With.own.children.under.18.years | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Family.households..families....Female.householder..no.husband.present...With.own.children.under.18.years | Number..HOUSEHOLDS.BY.TYPE...Total.households...Nonfamily.households | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Nonfamily.households | Number..HOUSEHOLDS.BY.TYPE...Total.households...Nonfamily.households...Householder.living.alone | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Nonfamily.households...Householder.living.alone | Number..HOUSEHOLDS.BY.TYPE...Total.households...Nonfamily.households...Householder.living.alone...Householder.65.years.and.over | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Nonfamily.households...Householder.living.alone...Householder.65.years.and.over | Number..HOUSEHOLDS.BY.TYPE...Total.households...Households.with.individuals.under.18.years | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Households.with.individuals.under.18.years | Number..HOUSEHOLDS.BY.TYPE...Total.households...Households.with.individuals.65.years.and.over | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Households.with.individuals.65.years.and.over | Number..HOUSEHOLDS.BY.TYPE...Total.households...Average.household.size | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Average.household.size | Number..HOUSEHOLDS.BY.TYPE...Total.households...Average.family.size | Percent..HOUSEHOLDS.BY.TYPE...Total.households...Average.family.size | Number..HOUSING.OCCUPANCY...Total.housing.units | Percent..HOUSING.OCCUPANCY...Total.housing.units | Number..HOUSING.OCCUPANCY...Total.housing.units...Occupied.housing.units | Percent..HOUSING.OCCUPANCY...Total.housing.units...Occupied.housing.units | Number..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units | Percent..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units | Number..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units...For.seasonal..recreational..or.occasional.use | Percent..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units...For.seasonal..recreational..or.occasional.use | Number..HOUSING.OCCUPANCY...Total.housing.units...Homeowner.vacancy.rate..percent. | Percent..HOUSING.OCCUPANCY...Total.housing.units...Homeowner.vacancy.rate..percent. | Number..HOUSING.OCCUPANCY...Total.housing.units...Rental.vacancy.rate..percent. | Percent..HOUSING.OCCUPANCY...Total.housing.units...Rental.vacancy.rate..percent. | Number..HOUSING.TENURE...Occupied.housing.units | Percent..HOUSING.TENURE...Occupied.housing.units | Number..HOUSING.TENURE...Occupied.housing.units...Owner.occupied.housing.units | Percent..HOUSING.TENURE...Occupied.housing.units...Owner.occupied.housing.units | Number..HOUSING.TENURE...Occupied.housing.units...Renter.occupied.housing.units | Percent..HOUSING.TENURE...Occupied.housing.units...Renter.occupied.housing.units | Number..HOUSING.TENURE...Occupied.housing.units...Average.household.size.of.owner.occupied.unit | Percent..HOUSING.TENURE...Occupied.housing.units...Average.household.size.of.owner.occupied.unit | Number..HOUSING.TENURE...Occupied.housing.units...Average.household.size.of.renter.occupied.unit | Percent..HOUSING.TENURE...Occupied.housing.units...Average.household.size.of.renter.occupied.unit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1400000US22001960100 | 22001960100 | Census Tract 9601, Acadia Parish, Louisiana | 6,188 | 100 | 2,920 | 47 | 3,268 | 53 | 462 | 8 | 502 | 8 | 541 | 9 | 572 | 9 | 375 | 6 | 728 | 12 | 913 | 15 | 699 | 11 | 301 | 5 | 252 | 4 | 433 | 7 | 287 | 5 | 123 | 2 | 34 | (X) | 4,304 | 70 | 1,957 | 32 | 2,347 | 38 | 4,031 | 65 | 996 | 16 | 843 | 14 | 295 | 5 | 548 | 9 | 6,174 | 100 | 4,455 | 72 | 1,675 | 27 | 12 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 0 | 14 | 0 | 4,468 | 72 | 1,677 | 27 | 19 | 0 | 8 | 0 | 0 | 0 | 30 | 1 | 6,188 | 100 | 87 | 1 | 51 | 1 | 1 | 0 | 0 | 0 | 35 | 1 | 6,101 | 99 | 4,398 | 71 | 6,188 | 100 | 6,030 | 97 | 2,236 | 36 | 1,119 | 18 | 2,199 | 36 | 1,700 | 28 | 274 | 4 | 143 | 2 | 202 | 3 | 109 | 2 | 158 | 3 | 151 | 2 | 7 | 0 | 2,236 | 100 | 1,595 | 71 | 868 | 39 | 1,119 | 50 | 573 | 26 | 363 | 16 | 237 | 11 | 641 | 29 | 585 | 26 | 303 | 14 | 962 | 43 | 569 | 25 | 3 | (X) | 3 | (X) | 2,410 | 100 | 2,236 | 93 | 174 | 7 | 15 | 1 | 1 | (X) | 8 | (X) | 2,236 | 100 | 1,526 | 68 | 710 | 32 | 3 | (X) | 3 | (X) |
2 | 1400000US22001960200 | 22001960200 | Census Tract 9602, Acadia Parish, Louisiana | 5,056 | 100 | 2,562 | 51 | 2,494 | 49 | 346 | 7 | 416 | 8 | 476 | 9 | 463 | 9 | 298 | 6 | 579 | 12 | 861 | 17 | 709 | 14 | 250 | 5 | 203 | 4 | 263 | 5 | 150 | 3 | 42 | 1 | 34 | (X) | 3,527 | 70 | 1,758 | 35 | 1,769 | 35 | 3,289 | 65 | 570 | 11 | 455 | 9 | 217 | 4 | 238 | 5 | 5,035 | 100 | 4,799 | 95 | 216 | 4 | 6 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 0 | 21 | 0 | 4,816 | 95 | 226 | 5 | 13 | 0 | 6 | 0 | 4 | 0 | 16 | 0 | 5,056 | 100 | 35 | 1 | 18 | 0 | 0 | 0 | 0 | 0 | 17 | 0 | 5,021 | 99 | 4,775 | 94 | 5,056 | 100 | 5,056 | 100 | 1,764 | 35 | 1,216 | 24 | 1,791 | 35 | 1,413 | 28 | 173 | 3 | 86 | 2 | 112 | 2 | 61 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1,764 | 100 | 1,408 | 80 | 722 | 41 | 1,216 | 69 | 617 | 35 | 134 | 8 | 81 | 5 | 356 | 20 | 310 | 18 | 128 | 7 | 781 | 44 | 339 | 19 | 3 | (X) | 3 | (X) | 1,909 | 100 | 1,764 | 92 | 145 | 8 | 31 | 2 | 1 | (X) | 7 | (X) | 1,764 | 100 | 1,461 | 83 | 303 | 17 | 3 | (X) | 3 | (X) |
3 | 1400000US22001960300 | 22001960300 | Census Tract 9603, Acadia Parish, Louisiana | 3,149 | 100 | 1,593 | 51 | 1,556 | 49 | 209 | 7 | 251 | 8 | 305 | 10 | 260 | 8 | 204 | 7 | 368 | 12 | 520 | 17 | 409 | 13 | 148 | 5 | 130 | 4 | 209 | 7 | 104 | 3 | 32 | 1 | 35 | (X) | 2,233 | 71 | 1,103 | 35 | 1,130 | 36 | 2,081 | 66 | 435 | 14 | 345 | 11 | 150 | 5 | 195 | 6 | 3,140 | 100 | 3,058 | 97 | 67 | 2 | 8 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 9 | 0 | 3,066 | 97 | 69 | 2 | 13 | 0 | 2 | 0 | 1 | 0 | 7 | 0 | 3,149 | 100 | 15 | 1 | 4 | 0 | 0 | 0 | 0 | 0 | 11 | 0 | 3,134 | 100 | 3,049 | 97 | 3,149 | 100 | 3,148 | 100 | 1,145 | 36 | 750 | 24 | 1,091 | 35 | 854 | 27 | 73 | 2 | 38 | 1 | 89 | 3 | 48 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | 1,145 | 100 | 883 | 77 | 445 | 39 | 750 | 66 | 369 | 32 | 93 | 8 | 52 | 5 | 262 | 23 | 228 | 20 | 91 | 8 | 475 | 42 | 247 | 22 | 3 | (X) | 3 | (X) | 1,246 | 100 | 1,145 | 92 | 101 | 8 | 19 | 2 | 1 | (X) | 7 | (X) | 1,145 | 100 | 1,041 | 91 | 104 | 9 | 3 | (X) | 3 | (X) |
4 | 1400000US22001960400 | 22001960400 | Census Tract 9604, Acadia Parish, Louisiana | 5,617 | 100 | 2,754 | 49 | 2,863 | 51 | 429 | 8 | 406 | 7 | 520 | 9 | 476 | 9 | 353 | 6 | 691 | 12 | 914 | 16 | 684 | 12 | 254 | 5 | 222 | 4 | 410 | 7 | 193 | 3 | 65 | 1 | 34 | (X) | 3,944 | 70 | 1,911 | 34 | 2,033 | 36 | 3,716 | 66 | 800 | 14 | 668 | 12 | 302 | 5 | 366 | 7 | 5,583 | 99 | 5,347 | 95 | 207 | 4 | 18 | 0 | 6 | 0 | 1 | 0 | 1 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 34 | 1 | 5,381 | 96 | 211 | 4 | 43 | 1 | 8 | 0 | 0 | 0 | 9 | 0 | 5,617 | 100 | 43 | 1 | 24 | 0 | 0 | 0 | 0 | 0 | 19 | 0 | 5,574 | 99 | 5,307 | 95 | 5,617 | 100 | 5,592 | 100 | 1,991 | 35 | 1,291 | 23 | 1,994 | 36 | 1,554 | 28 | 196 | 4 | 101 | 2 | 120 | 2 | 70 | 1 | 25 | 0 | 10 | 0 | 15 | 0 | 1,991 | 100 | 1,555 | 78 | 804 | 40 | 1,291 | 65 | 641 | 32 | 168 | 8 | 99 | 5 | 436 | 22 | 388 | 20 | 189 | 10 | 861 | 43 | 482 | 24 | 3 | (X) | 3 | (X) | 2,176 | 100 | 1,991 | 92 | 185 | 9 | 23 | 1 | 1 | (X) | 6 | (X) | 1,991 | 100 | 1,630 | 82 | 361 | 18 | 3 | (X) | 3 | (X) |
5 | 1400000US22001960500 | 22001960500 | Census Tract 9605, Acadia Parish, Louisiana | 4,927 | 100 | 2,461 | 50 | 2,466 | 50 | 400 | 8 | 438 | 9 | 439 | 9 | 418 | 9 | 319 | 7 | 704 | 14 | 777 | 16 | 644 | 13 | 227 | 5 | 154 | 3 | 234 | 5 | 134 | 3 | 39 | 1 | 32 | (X) | 3,405 | 69 | 1,675 | 34 | 1,730 | 35 | 3,162 | 64 | 499 | 10 | 407 | 8 | 167 | 3 | 240 | 5 | 4,901 | 100 | 4,498 | 91 | 378 | 8 | 15 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 5 | 0 | 26 | 1 | 4,524 | 92 | 385 | 8 | 35 | 1 | 4 | 0 | 1 | 0 | 9 | 0 | 4,927 | 100 | 61 | 1 | 37 | 1 | 1 | 0 | 0 | 0 | 23 | 1 | 4,866 | 99 | 4,448 | 90 | 4,927 | 100 | 4,921 | 100 | 1,692 | 34 | 1,068 | 22 | 1,809 | 37 | 1,418 | 29 | 174 | 4 | 71 | 1 | 178 | 4 | 100 | 2 | 6 | 0 | 0 | 0 | 6 | 0 | 1,692 | 100 | 1,326 | 78 | 762 | 45 | 1,068 | 63 | 611 | 36 | 182 | 11 | 103 | 6 | 366 | 22 | 300 | 18 | 136 | 8 | 808 | 48 | 316 | 19 | 3 | (X) | 3 | (X) | 1,796 | 100 | 1,692 | 94 | 104 | 6 | 22 | 1 | 1 | (X) | 5 | (X) | 1,692 | 100 | 1,419 | 84 | 273 | 16 | 3 | (X) | 3 | (X) |
6 | 1400000US22001960600 | 22001960600 | Census Tract 9606, Acadia Parish, Louisiana | 5,654 | 100 | 2,647 | 47 | 3,007 | 53 | 464 | 8 | 471 | 8 | 442 | 8 | 460 | 8 | 358 | 6 | 760 | 13 | 871 | 15 | 615 | 11 | 243 | 4 | 209 | 4 | 415 | 7 | 241 | 4 | 105 | 2 | 33 | (X) | 3,999 | 71 | 1,791 | 32 | 2,208 | 39 | 3,736 | 66 | 869 | 15 | 761 | 14 | 271 | 5 | 490 | 9 | 5,620 | 99 | 4,809 | 85 | 782 | 14 | 7 | 0 | 12 | 0 | 0 | 0 | 3 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 0 | 34 | 1 | 4,842 | 86 | 792 | 14 | 18 | 0 | 15 | 0 | 0 | 0 | 21 | 0 | 5,654 | 100 | 49 | 1 | 28 | 1 | 0 | 0 | 0 | 0 | 21 | 0 | 5,605 | 99 | 4,774 | 84 | 5,654 | 100 | 5,526 | 98 | 2,073 | 37 | 1,076 | 19 | 1,891 | 33 | 1,478 | 26 | 289 | 5 | 160 | 3 | 197 | 4 | 110 | 2 | 128 | 2 | 128 | 2 | 0 | 0 | 2,073 | 100 | 1,477 | 71 | 796 | 38 | 1,076 | 52 | 541 | 26 | 310 | 15 | 189 | 9 | 596 | 29 | 521 | 25 | 243 | 12 | 882 | 43 | 510 | 25 | 3 | (X) | 3 | (X) | 2,292 | 100 | 2,073 | 90 | 219 | 10 | 11 | 1 | 1 | (X) | 14 | (X) | 2,073 | 100 | 1,474 | 71 | 599 | 29 | 3 | (X) | 3 | (X) |
Visually, we can see that this data set is very wide. In fact, there are 195 columns.
Spaces are not allowed in R column names. That's why they've been automatically converted to periods, as in Number..Total.population
.
Let's keep a handful of these:
Id2
: This is what the census bureau calls a FIPS code. It is a unique numerical identifier for all census tracts. This will be important when we join our two datasets together.
Geography
: This is a text description of the tract, with the parish name.
Number..Total.population
: The total population of the tract.
Number..HOUSING.OCCUPANCY...Total.housing.units
, Number..HOUSING.OCCUPANCY...Total.housing.units...Occupied.housing.units
, and Number..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units
: The total, occupied and vacant housing units.
To help us trim the data set to just these six columns, we are going to import a package. There are thousands of packages for R created by the open-source community, which help improve on what is included in R by default.
The one we will use here is called dplyr.
## if dplyr was not installed we would have to run this
# install.packages('dplyr')
## to import the package and all of its functions
library('dplyr')
Warning message: : package ‘dplyr’ was built under R version 3.2.4 Attaching package: ‘dplyr’ The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union
From dplyr, we will use the select
function to trim the data set and save it to a new variable called census2000.trimmed:
census2000.trimmed <- select(
census2000, # name of the data frame
# list of all the six column names we want to keep
Id2,
Geography,
Number..Total.population,
Number..HOUSING.OCCUPANCY...Total.housing.units,
Number..HOUSING.OCCUPANCY...Total.housing.units...Occupied.housing.units,
Number..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units
)
head(census2000.trimmed)
Id2 | Geography | Number..Total.population | Number..HOUSING.OCCUPANCY...Total.housing.units | Number..HOUSING.OCCUPANCY...Total.housing.units...Occupied.housing.units | Number..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units | |
---|---|---|---|---|---|---|
1 | 22001960100 | Census Tract 9601, Acadia Parish, Louisiana | 6,188 | 2,410 | 2,236 | 174 |
2 | 22001960200 | Census Tract 9602, Acadia Parish, Louisiana | 5,056 | 1,909 | 1,764 | 145 |
3 | 22001960300 | Census Tract 9603, Acadia Parish, Louisiana | 3,149 | 1,246 | 1,145 | 101 |
4 | 22001960400 | Census Tract 9604, Acadia Parish, Louisiana | 5,617 | 2,176 | 1,991 | 185 |
5 | 22001960500 | Census Tract 9605, Acadia Parish, Louisiana | 4,927 | 1,796 | 1,692 | 104 |
6 | 22001960600 | Census Tract 9606, Acadia Parish, Louisiana | 5,654 | 2,292 | 2,073 | 219 |
This shows us that we were able to select the columns correctly. But one lingering issue is that these column names are long and unwieldy. Since we are going to be typing them often, let's rename them to shorter, more convenient versions:
colnames(census2000.trimmed) <- c(
'fips.code', 'geography', 'population',
'total.housing.units', 'occupied.housing.units', 'vacant.housing.units'
)
head(census2000.trimmed)
fips.code | geography | population | total.housing.units | occupied.housing.units | vacant.housing.units | |
---|---|---|---|---|---|---|
1 | 22001960100 | Census Tract 9601, Acadia Parish, Louisiana | 6,188 | 2,410 | 2,236 | 174 |
2 | 22001960200 | Census Tract 9602, Acadia Parish, Louisiana | 5,056 | 1,909 | 1,764 | 145 |
3 | 22001960300 | Census Tract 9603, Acadia Parish, Louisiana | 3,149 | 1,246 | 1,145 | 101 |
4 | 22001960400 | Census Tract 9604, Acadia Parish, Louisiana | 5,617 | 2,176 | 1,991 | 185 |
5 | 22001960500 | Census Tract 9605, Acadia Parish, Louisiana | 4,927 | 1,796 | 1,692 | 104 |
6 | 22001960600 | Census Tract 9606, Acadia Parish, Louisiana | 5,654 | 2,292 | 2,073 | 219 |
Another helpful command to run on any data set is str
, which gives you the structure of the variable as defined by R:
str(census2000.trimmed)
'data.frame': 1106 obs. of 6 variables: $ fips.code : num 2.2e+10 2.2e+10 2.2e+10 2.2e+10 2.2e+10 ... $ geography : Factor w/ 1106 levels "Census Tract 10.01, Lafayette Parish, Louisiana",..: 970 978 985 993 1000 1006 1011 1015 1018 1021 ... $ population : Factor w/ 1019 levels "0","1","10,248",..: 859 710 350 801 692 806 647 804 711 842 ... $ total.housing.units : Factor w/ 905 levels "1","10","1,002",..: 565 404 104 499 357 531 397 522 426 594 ... $ occupied.housing.units: Factor w/ 876 levels "0","1","1,001",..: 512 363 74 446 338 468 347 470 374 514 ... $ vacant.housing.units : Factor w/ 374 levels "0","1","10","100",..: 86 54 5 98 9 134 83 86 110 196 ...
The structure tells us that this is a data frame with 1106 rows and six columns. It further tells us the type of each column.
Notice how the FIPS code read in as a number but the other numeric columns read in as “factors”? That's R-speak for a categorical variable, and any character variables are by default set to this type. This happened because the numbers and those columns have commas. The presence of a single character within a number makes R treat the entire column as strings. This will be an issue later when we try to add two numbers together, as R doesn't know how to add two characters.
The solution: we need to remove the comma from all the strings, then recast the variable as a number.
To help with this we are going to use another package called stringr
, and a function from within it called str_replace
:
# install.packages('stringr')
library('stringr')
Warning message: : package ‘stringr’ was built under R version 3.2.5
Let's start with the population variable. First, let's remove the comma and write the result to the original column. (The format for calling a column from a data frame in R is df.name$column.name
)
census2000.trimmed$population <- str_replace(
census2000.trimmed$population,
pattern = ',',
replacement = ''
)
Then we'll visually inspect the head:
head(census2000.trimmed)
fips.code | geography | population | total.housing.units | occupied.housing.units | vacant.housing.units | |
---|---|---|---|---|---|---|
1 | 22001960100 | Census Tract 9601, Acadia Parish, Louisiana | 6188 | 2,410 | 2,236 | 174 |
2 | 22001960200 | Census Tract 9602, Acadia Parish, Louisiana | 5056 | 1,909 | 1,764 | 145 |
3 | 22001960300 | Census Tract 9603, Acadia Parish, Louisiana | 3149 | 1,246 | 1,145 | 101 |
4 | 22001960400 | Census Tract 9604, Acadia Parish, Louisiana | 5617 | 2,176 | 1,991 | 185 |
5 | 22001960500 | Census Tract 9605, Acadia Parish, Louisiana | 4927 | 1,796 | 1,692 | 104 |
6 | 22001960600 | Census Tract 9606, Acadia Parish, Louisiana | 5654 | 2,292 | 2,073 | 219 |
This appeared to work. But R will still think this is a character variable unless we explicitly tell it otherwise:
census2000.trimmed$population <- as.numeric(census2000.trimmed$population)
Running str
will help us ensure this worked:
str(census2000.trimmed)
'data.frame': 1106 obs. of 6 variables: $ fips.code : num 2.2e+10 2.2e+10 2.2e+10 2.2e+10 2.2e+10 ... $ geography : Factor w/ 1106 levels "Census Tract 10.01, Lafayette Parish, Louisiana",..: 970 978 985 993 1000 1006 1011 1015 1018 1021 ... $ population : num 6188 5056 3149 5617 4927 ... $ total.housing.units : Factor w/ 905 levels "1","10","1,002",..: 565 404 104 499 357 531 397 522 426 594 ... $ occupied.housing.units: Factor w/ 876 levels "0","1","1,001",..: 512 363 74 446 338 468 347 470 374 514 ... $ vacant.housing.units : Factor w/ 374 levels "0","1","10","100",..: 86 54 5 98 9 134 83 86 110 196 ...
For the rest of the columns we can nest the first function within the second to speed things up:
census2000.trimmed$total.housing.units <- as.numeric(str_replace(census2000.trimmed$total.housing.units, pattern = ',', replacement = ''))
census2000.trimmed$occupied.housing.units <- as.numeric(str_replace(census2000.trimmed$occupied.housing.units, pattern = ',', replacement = ''))
census2000.trimmed$vacant.housing.units <- as.numeric(str_replace(census2000.trimmed$vacant.housing.units, pattern = ',', replacement = ''))
str(census2000.trimmed)
'data.frame': 1106 obs. of 6 variables: $ fips.code : num 2.2e+10 2.2e+10 2.2e+10 2.2e+10 2.2e+10 ... $ geography : Factor w/ 1106 levels "Census Tract 10.01, Lafayette Parish, Louisiana",..: 970 978 985 993 1000 1006 1011 1015 1018 1021 ... $ population : num 6188 5056 3149 5617 4927 ... $ total.housing.units : num 2410 1909 1246 2176 1796 ... $ occupied.housing.units: num 2236 1764 1145 1991 1692 ... $ vacant.housing.units : num 174 145 101 185 104 219 171 174 196 284 ...
By default, head
will print the first six lines. But we can override the default to show as many as we want (we'll show 10 here):
head(census2000.trimmed, n = 10)
fips.code | geography | population | total.housing.units | occupied.housing.units | vacant.housing.units | |
---|---|---|---|---|---|---|
1 | 22001960100 | Census Tract 9601, Acadia Parish, Louisiana | 6188 | 2410 | 2236 | 174 |
2 | 22001960200 | Census Tract 9602, Acadia Parish, Louisiana | 5056 | 1909 | 1764 | 145 |
3 | 22001960300 | Census Tract 9603, Acadia Parish, Louisiana | 3149 | 1246 | 1145 | 101 |
4 | 22001960400 | Census Tract 9604, Acadia Parish, Louisiana | 5617 | 2176 | 1991 | 185 |
5 | 22001960500 | Census Tract 9605, Acadia Parish, Louisiana | 4927 | 1796 | 1692 | 104 |
6 | 22001960600 | Census Tract 9606, Acadia Parish, Louisiana | 5654 | 2292 | 2073 | 219 |
7 | 22001960700 | Census Tract 9607, Acadia Parish, Louisiana | 4614 | 1894 | 1723 | 171 |
8 | 22001960800 | Census Tract 9608, Acadia Parish, Louisiana | 5640 | 2254 | 2080 | 174 |
9 | 22001960900 | Census Tract 9609, Acadia Parish, Louisiana | 5059 | 1978 | 1782 | 196 |
10 | 22001961000 | Census Tract 9610, Acadia Parish, Louisiana | 5965 | 2526 | 2242 | 284 |
That worked!
But in the interest of full disclosure, you should know that we added those commas to the original CSVs from the Census Bureau to facilitate this exercise. “Commafied” numbers are one of the most frequent stumbling blocks to creating a cleaned data set.
For our last cleaning exercise, we'll work with the geography column. It has a lot of information in there, but it would be more useful if the census tract, parish name and state were separated, to help us aggregate some of these numbers.
The package tidyr has a function that helps us do just that:
# install.packages('tidyr')
library('tidyr')
Should you run into a function and not know what arguments it takes, running the function name, a pair of of empty parentheses afterwards, preceded by a question mark will allow you to access the documentation on that function:
# ?separate()
census2000.trimmed <- separate(
census2000.trimmed, # name of the data frame
geography, # column to split
c('tract', 'parish', 'state'), # new column names
', ' # delimiter to split on (note the space after the comma)
)
head(census2000.trimmed)
fips.code | tract | parish | state | population | total.housing.units | occupied.housing.units | vacant.housing.units | |
---|---|---|---|---|---|---|---|---|
1 | 22001960100 | Census Tract 9601 | Acadia Parish | Louisiana | 6188 | 2410 | 2236 | 174 |
2 | 22001960200 | Census Tract 9602 | Acadia Parish | Louisiana | 5056 | 1909 | 1764 | 145 |
3 | 22001960300 | Census Tract 9603 | Acadia Parish | Louisiana | 3149 | 1246 | 1145 | 101 |
4 | 22001960400 | Census Tract 9604 | Acadia Parish | Louisiana | 5617 | 2176 | 1991 | 185 |
5 | 22001960500 | Census Tract 9605 | Acadia Parish | Louisiana | 4927 | 1796 | 1692 | 104 |
6 | 22001960600 | Census Tract 9606 | Acadia Parish | Louisiana | 5654 | 2292 | 2073 | 219 |
Our data set is as cleaned up as we need it to be now.
Let's summarize it with a frequency table of the county names:
table(census2000.trimmed$parish)
Acadia Parish Allen Parish 12 5 Ascension Parish Assumption Parish 14 6 Avoyelles Parish Beauregard Parish 9 7 Bienville Parish Bossier Parish 5 19 Caddo Parish Calcasieu Parish 64 41 Caldwell Parish Cameron Parish 3 2 Catahoula Parish Claiborne Parish 3 5 Concordia Parish De Soto Parish 5 7 East Baton Rouge Parish East Carroll Parish 89 3 East Feliciana Parish Evangeline Parish 4 8 Franklin Parish Grant Parish 6 5 Iberia Parish Iberville Parish 15 8 Jackson Parish Jefferson Davis Parish 5 7 Jefferson Parish Lafayette Parish 123 41 Lafourche Parish La Salle Parish 22 3 Lincoln Parish Livingston Parish 10 13 Madison Parish Morehouse Parish 5 8 Natchitoches Parish Orleans Parish 9 181 Ouachita Parish Plaquemines Parish 41 8 Pointe Coupee Parish Rapides Parish 6 34 Red River Parish Richland Parish 2 6 Sabine Parish St. Bernard Parish 7 17 St. Charles Parish St. Helena Parish 13 2 St. James Parish St. John the Baptist Parish 7 11 St. Landry Parish St. Martin Parish 19 9 St. Mary Parish St. Tammany Parish 16 35 Tangipahoa Parish Tensas Parish 18 3 Terrebonne Parish Union Parish 20 6 Vermilion Parish Vernon Parish 10 9 Washington Parish Webster Parish 10 11 West Baton Rouge Parish West Carroll Parish 4 3 West Feliciana Parish Winn Parish 3 4
Now we need to run all of the above cleaning steps on the 2010 data:
census2010 <- read.csv('2010_census_demographic_profile.csv', skip = 1)
census2010.trimmed <- select(
census2010, # name of the data frame
# list of all the column names we want to keep
Id2, Geography, Number..SEX.AND.AGE...Total.population,
Number..HOUSING.OCCUPANCY...Total.housing.units,
Number..HOUSING.OCCUPANCY...Total.housing.units...Occupied.housing.units,
Number..HOUSING.OCCUPANCY...Total.housing.units...Vacant.housing.units
)
colnames(census2010.trimmed) <- c('fips.code', 'census.tract', 'population',
'total.housing.units', 'occupied.housing.units', 'vacant.housing.units')
census2010.trimmed$population <- as.numeric(str_replace(census2010.trimmed$population, pattern = ',', replacement = ''))
census2010.trimmed$total.housing.units <- as.numeric(str_replace(census2010.trimmed$total.housing.units, pattern = ',', replacement = ''))
census2010.trimmed$occupied.housing.units <- as.numeric(str_replace(census2010.trimmed$occupied.housing.units, pattern = ',', replacement = ''))
census2010.trimmed$vacant.housing.units <- as.numeric(str_replace(census2010.trimmed$vacant.housing.units, pattern = ',', replacement = ''))
census2010.trimmed <- separate(census2010.trimmed, census.tract, c('tract', 'parish', 'state'), ', ')
orleans2010 <- filter(census2010.trimmed, parish == 'Orleans Parish')
Now that we've cleaned both of our data files, let's merge the 2000 and 2010 data. Merging allows you to link two data sets on values common to both. It is a powerful operation that cannot be easily done in a program like Excel with such versatility.
In this case, we know that the FIPS code and the character names for most of the tracts should be consistent across the 10-year period.
However, census tracts are added, deleted, split and joined over the course of 10 years. We will make sure to keep all entries in both years. This is what is referred to as a "full outer join.” If we were to only keep all rows that were common to both data frames (R’s default behavior) we would lose some data.
census.comparison <- merge(
census2000.trimmed, # first data frame
census2010.trimmed, # second data frame
by = c('fips.code', 'tract', 'parish', 'state'), # keys to use for join
suffixes = c('.00', '.10'), # suffixes to append to new columns
all = TRUE # specifying to keep all data from both data frames
)
Let's inspect a portion of the data frame where there are full matches and partial matches:
census.comparison[65:69, ]
fips.code | tract | parish | state | population.00 | total.housing.units.00 | occupied.housing.units.00 | vacant.housing.units.00 | population.10 | total.housing.units.10 | occupied.housing.units.10 | vacant.housing.units.10 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
65 | 22015010801 | Census Tract 108.01 | Bossier Parish | Louisiana | 3159 | 1527 | 1351 | 176 | 3359 | 1415 | 1284 | 131 |
66 | 22015010803 | Census Tract 108.03 | Bossier Parish | Louisiana | 5362 | 2250 | 2120 | 130 | NA | NA | NA | NA |
67 | 22015010804 | Census Tract 108.04 | Bossier Parish | Louisiana | 6101 | 2383 | 2282 | 101 | 7278 | 2943 | 2815 | 128 |
68 | 22015010805 | Census Tract 108.05 | Bossier Parish | Louisiana | NA | NA | NA | NA | 3238 | 1585 | 1425 | 160 |
69 | 22015010806 | Census Tract 108.06 | Bossier Parish | Louisiana | NA | NA | NA | NA | 4086 | 1671 | 1610 | 61 |
Saving your intermediate work to a file is often good practice, so we will write the results of our merge to a CSV (you can do this with any data frame you create in R).
write.csv(census.comparison, 'census_comparison_result.csv', row.names = FALSE)
Let's filter our merged data frame down to just Orleans Parish. The Orleans Parish and the city of New Orleans are “coterminous” (that is, they share the same boundaries), so this will isolate only the census tracts of the city.
# note the use of "==" since we are expressing a criterion
orleans <- filter(census.comparison, parish == 'Orleans Parish')
head(orleans)
fips.code | tract | parish | state | population.00 | total.housing.units.00 | occupied.housing.units.00 | vacant.housing.units.00 | population.10 | total.housing.units.10 | occupied.housing.units.10 | vacant.housing.units.10 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2.2071e+10 | Census Tract 1 | Orleans Parish | Louisiana | 2381 | 1408 | 1145 | 263 | 2455 | 1513 | 1229 | 284 |
2 | 2.2071e+10 | Census Tract 2 | Orleans Parish | Louisiana | 1347 | 691 | 496 | 195 | 1197 | 738 | 496 | 242 |
3 | 2.2071e+10 | Census Tract 3 | Orleans Parish | Louisiana | 1468 | 719 | 559 | 160 | 1231 | 641 | 467 | 174 |
4 | 2.2071e+10 | Census Tract 4 | Orleans Parish | Louisiana | 2564 | 1034 | 873 | 161 | 2328 | 1137 | 911 | 226 |
5 | 2.2071e+10 | Census Tract 6.01 | Orleans Parish | Louisiana | 2034 | 704 | 506 | 198 | 849 | 328 | 269 | 59 |
6 | 2.2071e+10 | Census Tract 6.02 | Orleans Parish | Louisiana | 2957 | 1106 | 1011 | 95 | 2534 | 1108 | 923 | 185 |
Now we can do some quick calculations with our new merged data frame for New Orleans.
First question: What was the population of New Orleans in 2000?
That requires summing up the 2000 population column like so:
sum(orleans$population.00)
[1] NA
Why didn't this work? Let's inspect the population.00 variable using summary
:
summary(orleans$population.00)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 52 1711 2274 2678 3141 9931 30
This reveals that there are 30 census tracts that have NA, or missing, values for population.00. By default, R does not compute a sum of a column if there are missing values. We'll have to tell it to ignore these missing values by specifying na.rm = TRUE
:
sum(orleans$population.00, na.rm = TRUE)
Second question: What was the population of New Orleans in 2010?
sum(orleans$population.10, na.rm = TRUE)
This matches the story exactly.
Last question: What was the percent change in New Orleans population between 2000 and 2010?
To do this, we'll first save each population calculation to new objects. Then we'll create another object to store the percent change.
nola2000pop <- sum(orleans$population.00, na.rm = TRUE)
nola2010pop <- sum(orleans$population.10, na.rm = TRUE)
perc.change.nola <- (nola2010pop - nola2000pop)/nola2000pop * 100
print(paste('The percent change in New Orleans population since 2000 is ', round(perc.change.nola), '%', sep =''))
[1] "The percent change in New Orleans population since 2000 is -29%"
Again, we see that this matches the 29% drop cited by The Times-Picayune article. Yay!
This concludes our workshop, Getting started with R.
We'll use the merged data CSV we saved above to do further analysis in our next workshop, More with R. Take a sneak peak at the notebook here.
Any questions?