In [ ]:

import pandas as pd
# pd.set_option('max_colwidth', 50)
# set this if you need to

The Health Department has developed an inspection report and scoring system. After conducting an inspection of the facility, the Health Inspector calculates a score based on the violations observed. Violations can fall into:

high risk category: records specific violations that directly relate to the transmission of food borne illnesses,the adulteration of food products and the contamination of food-contact surfaces.
moderate risk category: records specific violations that are of a moderate risk to the public health and safety.
low risk category: records violations that are low risk or have no immediate risk to the public health and safety.

In [ ]:

businesses = pd.read_csv('./data/businesses_plus.csv', parse_dates=True, dtype={'phone_number': str})
businesses.head()
# dtype casts the column as a specific data type

In [ ]:

inspections = pd.read_csv('./data/inspections_plus.csv', parse_dates=True)
inspections.head()

In [ ]:

violations = pd.read_csv('./data/violations_plus.csv', parse_dates=True)
violations.head()

In [ ]:

# 1 Combine the three dataframes into one data frame called restaurant_scores
# Hint: http://pandas.pydata.org/pandas-docs/stable/merging.html

In [ ]:

# 2 Which ten business have had the most inspections?

In [ ]:

# 3 Group and count the inspections by type

In [ ]:

# 4 Create a plot that shows number of inspections per month
# Bonus for creating a heatmap
# http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.heatmap.html?highlight=heatmap

In [ ]:

# 5 Which zip code contains the most high risk violations?

In [ ]:

# 6 If inspection is prompted by a change in restaurant ownership, 
# is the inspection more likely to be categorized as higher or lower risk?

In [ ]:

# 7 Examining the descriptions, what is the most common violation?

In [ ]:

# 8 Create a hist of the scores with 10 bins

In [ ]:

# 9 Can you predict risk category based on the other features in this dataset?

In [ ]:

# 10 Extra Credit:
# Use Instagram location API to find pictures taken at the lat, long of the most High Risk restaurant
# https://www.instagram.com/developer/endpoints/locations/

In [ ]:

############################
### A Little More Morbid ###
############################

In [ ]:

killings = pd.read_csv('./data/police-killings.csv')
killings.head()

In [ ]:

# 1. Make the following changed to column names:
# lawenforcementagency -> agency
# raceethnicity        -> race

In [ ]:

# 2. Show the count of missing values in each column

In [ ]:

# 3. replace each null value in the dataframe with the string "Unknown"

In [ ]:

# 4. How many killings were there so far in 2015?

In [ ]:

# 5. Of all killings, how many were male and how many female?

In [ ]:

# 6. How many killings were of unarmed people?

In [ ]:

# 7. What percentage of all killings were unarmed?

In [ ]:

# 8. What are the 5 states with the most killings?

In [ ]:

# 9. Show a value counts of deaths for each race

In [ ]:

# 10. Display a histogram of ages of all killings

In [ ]:

# 11. Show 6 histograms of ages by race

In [ ]:

# 12. What is the average age of death by race?

In [ ]:

# 13. Show a bar chart with counts of deaths every month

In [ ]:

###################
### Less Morbid ###
###################

In [ ]:

majors = pd.read_csv('./data/college-majors.csv')
majors.head()

In [ ]:

# 1. Delete the columns (employed_full_time_year_round, major_code)

In [ ]:

# 2. Show the cout of missing values in each column

In [ ]:

# 3. What are the top 10 highest paying majors?

In [ ]:

# 4. Plot the data from the last question in a bar chart, include proper title, and labels!

In [ ]:

# 5. What is the average median salary for each major category?

In [ ]:

# 6. Show only the top 5 paying major categories

In [ ]:

# 7. Plot a histogram of the distribution of median salaries

In [ ]:

# 8. Plot a histogram of the distribution of median salaries by major category

In [ ]:

# 9. What are the top 10 most UNemployed majors?

In [ ]:

# What are the unemployment rates?

In [ ]:

# 10. What are the top 10 most UNemployed majors CATEGORIES? Use the mean for each category
# What are the unemployment rates?

In [ ]:

# 11. the total and employed column refer to the people that were surveyed.
# Create a new column showing the emlpoyment rate of the people surveyed for each major
# call it "sample_employment_rate"
# Example the first row has total: 128148 and employed: 90245. it's 
# sample_employment_rate should be 90245.0 / 128148.0 = .7042

In [ ]:

# 12. Create a "sample_unemployment_rate" column
# this column should be 1 - "sample_employment_rate"

In [ ]: