#!/usr/bin/env python # coding: utf-8 # # Guided Project: Data Visualization Fun[damentals] # # In this project I look forward to selecting and applying various ways of graphically representing information derived specifically from **I-94 westbound traffic data**. # # This effort to identify indicators of traffic slowdowns is an example of **exploratory anaylsis**. # # # Summary of Conculsions # # ## Daytime Vs. Nighttime # # The Nighttime (7pm - 7am) Traffic Volume Frequency Histogram is Right-Skewed indicating low traffic volume. Data from Nighttime was excluded from the current analysis because it does not offer much insight into high traffic volume indicators. # # ## Month # # The Winter months (Nov-Feb) have lower traffic volume, with the lowest in Dec/Jan. This could be a combination of the holidays and the colder weather. # # July was an exception to the higher traffic volumes observed in the Spring-Fall (Mar-Oct) with a noticeable dip. This dip is curious but does not provide insight on high traffic volume indicators. # # ## Day of the Week # # Business Days (Mon-Fri) showed much higher traffic volume than Weekends (Sat-Sun), and the line graphs of the hourly changes in traffic volume for each displayed two very distinct trends. # # Business Day Traffic Volume peaked very high at 7:00 and then again at 16:00. # # Weekend Traffic Volume started low in the morning and increased steadily to maintain a steady middle level throughout the afternoon. # # ## Weather # # There was mostly just a weak correlation between the different types of weather events and traffic volume. # # However, when looking a little more closely, weather events that specifically combined rain and snow had higher traffic volumes than other events that involved just rain or snow. An explanation could be icy road conditions. # # # # Full Exploratory Analysis # # **Let's get probing!!** # # # ## First Peek at the Data Set # # There are only 9 columns in the dataset, mostly related to weather. Otherwise it also tells us the date and time, whether it was a holiday and of course the traffic volume. # # The data set is complete with no values missing. # In[1]: import pandas as pd dataset = pd.read_csv('Metro_Interstate_Traffic_Volume.csv') print(dataset.head()) print(dataset.tail()) print(dataset.info()) # ## Traffic Volume - Histogram # In[2]: import matplotlib.pyplot as plt get_ipython().run_line_magic('matplotlib', 'inline') dataset["traffic_volume"].plot.hist(bins=10) plt.xlabel("Traffic Volume") plt.show() # ## Traffic Volume - Quick Stats # In[3]: dataset["traffic_volume"].describe() # ## Traffic Volume - Initial Observations # # Based on the series desription and histogram generated above, the most common traffic volumes are less than 1000 cars or closer to 5000 cars. # # Certainly time of day has an influence on the # of cars on the road. Given the shape of the histogram, the data may be indicating a nightly average around 1000 cars and a daytime average around 4500. # ## Traffic Volume : Daytime vs. Nighttime # ## Preparing the Dataset : Append Hour # # The below code converts the date_time column string into datetime format and then appends the hour values as a 10th column of the dataset. # In[4]: dataset["date_time"] = pd.to_datetime(dataset["date_time"]) dataset["hour"] = dataset["date_time"].dt.hour print(dataset.head()) # ## Traffic Volume : Daytime vs. Nighttime # ## Preparing the Dataset : Split by Hour # Next we'll separate the data into a **dayset** and a **nightset** based on 7am onwards being daytime and 7pm onwards being nighttime. # # In[5]: bool_day = (dataset["hour"] >= 7) & (dataset["hour"] < 19) bool_night = ~bool_day dayset = dataset.loc[bool_day].copy() nightset = dataset.loc[bool_night].copy() # ## Traffic Volume: Daytime vs. Nighttime # ## Histogram Grid # In[6]: plt.figure(figsize=(12,5)) plt.subplot(1,2,1) plt.hist(dayset["traffic_volume"], bins=10) plt.title("Daytime Traffic Volumes") plt.xlabel("Traffic Volume") plt.ylabel("Frequency") plt.xlim([0,8000]) plt.ylim([0,8000]) plt.subplot(1,2,2) plt.hist(nightset["traffic_volume"], bins=10) plt.title("Nighttime Traffic Volumes") plt.xlabel("Traffic Volume") plt.ylabel("Frequency") plt.xlim([0,8000]) plt.ylim([0,8000]) plt.show() # ## Traffic Volume: Daytime vs. Nighttime # ## Quick Stats # In[7]: print("Daytime Traffic Volume Statistics:") print(dayset["traffic_volume"].describe()) print() print("Nighttime Traffic Volume Statistics:") print(nightset["traffic_volume"].describe()) # ## Traffic Volume: Daytime vs. Nighttime # ## Observations # # The **dayset** histogram is **left-skewed** with higher traffic volumes occuring more often during this period. # # The **nightset** histogram is **right-skewed** with lower traffic volumes occurring more often. # # Since nighttime data reflects lower traffic volumes, it will not be as useful to find indicators of heavy traffic which is significantly more common during the day. # # > Moving forward we'll focus on the **dayset** data. # # ## Daytime Traffic Volume : Month Influence # ## Preparing the Data : Append Month # # Adding an 11th column to store the month using similar method as for hour: # In[8]: dayset["month"] = dayset["date_time"].dt.month # ## Daytime Traffic Volume : Month Influence # ## Generate Statistics : Traffic Volume Monthly Means # # Generate the monthly traffic volume averages and plot them. # In[9]: by_month = dayset.groupby("month").mean() by_month['traffic_volume'] # ## Daytime Traffic Volume : Month Influence # ## Visualize Data : Line Graph of Monthly Mean Traffic Volume # # Plotting the monthly mean values we can see that traffic volume is pretty steady at 4900 each month but the **traffic volume dips around the around when people often take vacation: Nov-Feb (Thanksgiving/Christmas) and July**. # In[10]: plt.plot(by_month["traffic_volume"]) plt.title("Traffic Volume by Month") plt.xlabel("Month") plt.ylabel("Mean Traffic Volume") plt.show() # ## Daytime Traffic Volume : Day of Week Influence # ## Preparing the Data : *per Month* # # Use similar techinque to identify the day of the week of each data row and generating mean traffic volumes for each DOW. # In[11]: dayset['dayofweek'] = dayset['date_time'].dt.dayofweek by_dayofweek = dayset.groupby('dayofweek').mean() by_dayofweek['traffic_volume'] # 0 is Monday, 6 is Sunday # ## Daytime Traffic Volume : Day of Week Influence # ## Visualize Data : Line Graph of Day of Week Mean Traffic Volume # # Plotting the mean values for each day of the week we can see that **traffic volume is lower on Saturday and Sunday than it is from Monday-Friday**. # In[12]: plt.plot(by_dayofweek["traffic_volume"]) plt.title ("Traffic Volume by Day of Week") plt.xlabel ("Day of Week") plt.ylabel("Mean Traffic Volume") plt.xticks(ticks=[0,1,2,3,4,5,6], labels=["Mon", "Tues", "Weds", "Thurs", "Fri", "Sat", "Sun"]) plt.show() # ## Daytime Traffic Volume - Business Days vs. Weekends # ## Generate Hourly Statistics # # Split the dataset again, this time based on the day of week value, making a set for business days and a set for weekends. # # Calculate the mean traffic volume per hour for each day of week set. # # At first class we can already see that 7am has 4x higher traffic volume on weekdays (\~6000) compared with weekends (\~1500). # In[13]: business = dayset[dayset["dayofweek"] <=4].copy() weekend = dayset[dayset["dayofweek"] > 4].copy() by_hour_business = business.groupby("hour").mean() by_hour_weekend = weekend.groupby("hour").mean() print("Traffic Volume per Hour on Business Days (Mon-Fri)") print(by_hour_business["traffic_volume"]) print() print("Traffic Volume per Hour on Weekends (Sat-Sun)") print(by_hour_weekend["traffic_volume"]) # ## Daytime Traffic Volume - Business Days vs. Weekends # ## Visualize Data - Plot Grid of Hourly Traffic Volume Comparison # # The plot grid below shows **much different hourly traffic volume levels throughout the day between Business Day and Weekends as represented by the different shapes in their line graphs**. # # **On business days** the traffic volume starts high early in the morning, comes down mid morning and then slowly builds to another peak late afternoon/early evening. *This peak mean vehicle traffic represents the times of day people are most likely to be commuting. * # # **On weekends**, however, traffic volume early in the morning is low, climbing steadily until noon when it stabilizes at a medium level for the rest of the afternoon before tapering off slowly into evening.** # In[14]: plt.figure(figsize=(12,5)) plt.subplot(1,2,1) plt.plot(by_hour_business["traffic_volume"]) plt.title("Business Day (M-F) Traffic Volume by Hour") plt.xlabel("Hour") plt.xticks(ticks=[8, 10, 12, 14, 16, 18], labels=["8:00", "10:00", "12:00", "14:00", "16:00", "18:00"], rotation=30) plt.ylabel("Mean Traffic Volume") plt.ylim([0,7000]) plt.subplot(1,2,2) plt.plot(by_hour_weekend["traffic_volume"]) plt.title("Weekend (Sa-So) Traffic Volume by Hour") plt.xlabel("Hour") plt.xticks(ticks=[8, 10, 12, 14, 16, 18], labels=["8:00", "10:00", "12:00", "14:00", "16:00", "18:00"], rotation=30) plt.ylabel("Mean Traffic Volume") plt.ylim([0,8000]) plt.show() # ## Daytime Traffic Volume - Weather Influence # ## Explore Correlation with Weather Metrics # # Viewing the correlation value we find mostly a **weak [absolute] correlation of traffic volume with the numerical weather metrics**. *Temperature shows to have slighly more influence than precipitation or cloud cover.* # In[15]: print("Correlation with Traffic Volume:") print( dayset[["temp","rain_1h", "snow_1h", "clouds_all"]].corrwith(dayset["traffic_volume"]) ) # ## Daytime Traffic Volume - Weather Influence # ## Bar Plot of Weather Descriptors # # Create statistics based on weather qualifiers (type categories). # # Looking at the Main Weather Events there are no values that stand out as above average. # # However, when looking in more detail, **Weather Events that involve both Rain and Snow have higher traffic volume**. Specifically weather descriptoins called "shower snow" and "light rain and snow" have higher traffic volume. The explanation could involve icy conditions. # # *The only other even combining both types of preciptiation is "light shower and snow" but assuming severity of rain being having a heirarchy from least to most severe: [light shower, shower, light rain], "light shower" and snow may not lead to icy conditions.* # In[16]: by_weather_main = dayset.groupby('weather_main').mean() by_weather_description = dayset.groupby('weather_description').mean() by_weather_main["traffic_volume"].plot.barh() plt.title("Traffic Volume for Main Weather Events") plt.xlabel("Traffic Volume") plt.ylabel("Main Weather Events") plt.show() by_weather_description["traffic_volume"].plot.barh(figsize=(5,10)) plt.show() # # Summary of Conculsions # # ## Daytime Vs. Nighttime # # The Nighttime (7pm - 7am) Traffic Volume Frequency Histogram is Right-Skewed indicating low traffic volume. Data from Nighttime was excluded from the current analysis because it does not offer much insight into high traffic volume indicators. # # ## Month # # The Winter months (Nov-Feb) have lower traffic volume, with the lowest in Dec/Jan. This could be a combination of the holidays and the colder weather. # # July was an exception to the higher traffic volumes observed in the Spring-Fall (Mar-Oct) with a noticeable dip. This dip is curious but does not provide insight on high traffic volume indicators. # # ## Day of the Week # # Business Days (Mon-Fri) showed much higher traffic volume than Weekends (Sat-Sun), and the line graphs of the hourly changes in traffic volume for each displayed two very distinct trends. # # Business Day Traffic Volume peaked very high at 7:00 and then again at 16:00. # # Weekend Traffic Volume started low in the morning and increased steadily to maintain a steady middle level throughout the afternoon. # # ## Weather # # There was mostly just a weak correlation between the different types of weather events and traffic volume. # # However, when looking a little more closely, weather events that specifically combined rain and snow had higher traffic volumes than other events that involved just rain or snow. An explanation could be icy road conditions. # # # In[ ]: