Project: Finding Heavy Traffic Indicators on I-94 The goal of the project is to determine a few indicators of heavy traffic on highway I-94. These indicators can be weather type, time of day, time of the week, etc.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
I_94 = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
print(I_94.info())
print(I_94.head())
print(I_94.tail())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 48204 entries, 0 to 48203 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 holiday 48204 non-null object 1 temp 48204 non-null float64 2 rain_1h 48204 non-null float64 3 snow_1h 48204 non-null float64 4 clouds_all 48204 non-null int64 5 weather_main 48204 non-null object 6 weather_description 48204 non-null object 7 date_time 48204 non-null object 8 traffic_volume 48204 non-null int64 dtypes: float64(3), int64(2), object(4) memory usage: 3.3+ MB None holiday temp rain_1h snow_1h clouds_all weather_main \ 0 None 288.28 0.0 0.0 40 Clouds 1 None 289.36 0.0 0.0 75 Clouds 2 None 289.58 0.0 0.0 90 Clouds 3 None 290.13 0.0 0.0 90 Clouds 4 None 291.14 0.0 0.0 75 Clouds weather_description date_time traffic_volume 0 scattered clouds 2012-10-02 09:00:00 5545 1 broken clouds 2012-10-02 10:00:00 4516 2 overcast clouds 2012-10-02 11:00:00 4767 3 overcast clouds 2012-10-02 12:00:00 5026 4 broken clouds 2012-10-02 13:00:00 4918 holiday temp rain_1h snow_1h clouds_all weather_main \ 48199 None 283.45 0.0 0.0 75 Clouds 48200 None 282.76 0.0 0.0 90 Clouds 48201 None 282.73 0.0 0.0 90 Thunderstorm 48202 None 282.09 0.0 0.0 90 Clouds 48203 None 282.12 0.0 0.0 90 Clouds weather_description date_time traffic_volume 48199 broken clouds 2018-09-30 19:00:00 3543 48200 overcast clouds 2018-09-30 20:00:00 2781 48201 proximity thunderstorm 2018-09-30 21:00:00 2159 48202 overcast clouds 2018-09-30 22:00:00 1450 48203 overcast clouds 2018-09-30 23:00:00 954
I_94['traffic_volume'].plot.hist()
I_94['traffic_volume'].describe()
count 48204.000000 mean 3259.818355 std 1986.860670 min 0.000000 25% 1193.000000 50% 3380.000000 75% 4933.000000 max 7280.000000 Name: traffic_volume, dtype: float64
According to the histogram we see an hourly spike in the beginnig of the day and close to the end of the day. 25% of the time less than 1193 cars are moving through the highway. The mean and the meadian seem very close which indicates that the hourly traffic spikes at the beginning and end of the day are moving both statistics. In conclusion daytime and nighttime does have a correlation with heavy traffic.
I_94['date_time'] = pd.to_datetime(I_94['date_time'])
daytime = I_94[(I_94['date_time'].dt.hour >= 7) & (I_94['date_time'].dt.hour < 19)]
nighttime = I_94[(I_94['date_time'].dt.hour >= 19) | (I_94['date_time'].dt.hour < 7)]
print(daytime.shape)
print(nighttime.shape)
print(nighttime.head())
(23877, 9) (24327, 9) holiday temp rain_1h snow_1h clouds_all weather_main \ 10 None 290.97 0.0 0.0 20 Clouds 11 None 289.38 0.0 0.0 1 Clear 12 None 288.61 0.0 0.0 1 Clear 13 None 287.16 0.0 0.0 1 Clear 14 None 285.45 0.0 0.0 1 Clear weather_description date_time traffic_volume 10 few clouds 2012-10-02 19:00:00 3539 11 sky is clear 2012-10-02 20:00:00 2784 12 sky is clear 2012-10-02 21:00:00 2361 13 sky is clear 2012-10-02 22:00:00 1529 14 sky is clear 2012-10-02 23:00:00 963
plt.figure(figsize=(10,6))
plt.subplot(1,2,1)
plt.hist(daytime['traffic_volume'])
plt.title('Traffic volume by daytime')
plt.xlabel('Traffic Volume')
plt.ylabel('Frequency')
plt.ylim(0,8000)
plt.xlim(-100,8000)
plt.subplot(1,2,2)
plt.hist(nighttime['traffic_volume'])
plt.title('Traffic volume by nighttime')
plt.xlabel('Traffic Volume')
plt.ylabel('Frequency')
plt.ylim(0,8000)
plt.xlim(-100,8000)
plt.show()
daytime['traffic_volume'].describe()
count 23877.000000 mean 4762.047452 std 1174.546482 min 0.000000 25% 4252.000000 50% 4820.000000 75% 5559.000000 max 7280.000000 Name: traffic_volume, dtype: float64
nighttime['traffic_volume'].describe()
count 24327.000000 mean 1785.377441 std 1441.951197 min 0.000000 25% 530.000000 50% 1287.000000 75% 2819.000000 max 6386.000000 Name: traffic_volume, dtype: float64
The daytime traffic has the shape of a normal distribution with a slight skew to the right. This is coherent with the statistic indicators shown in which the mean and the median are the very similar. We also con see that a lot of the heavy traffic occurs during the daytime
The nighttime traffic on other hand has a skewness to the left which is coherente with having less traffic than in the daytime. We rarely see days with traffic volume over 3000.
daytime['month'] = daytime['date_time'].dt.month
by_month = daytime.groupby('month').mean()
by_month['traffic_volume']
<ipython-input-7-7d784f045492>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
month 1 4495.613727 2 4711.198394 3 4889.409560 4 4906.894305 5 4911.121609 6 4898.019566 7 4595.035744 8 4928.302035 9 4870.783145 10 4921.234922 11 4704.094319 12 4374.834566 Name: traffic_volume, dtype: float64
by_month['traffic_volume'].plot.line()
plt.show()
It seems that traffic is heavier during the schoolyear months and then goes down as summer begins and during winter.
daytime['days'] = daytime['date_time'].dt.dayofweek
by_weekday = daytime.groupby('days').mean()
by_weekday['traffic_volume']
<ipython-input-9-16ef9ce711f4>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
days 0 4893.551286 1 5189.004782 2 5284.454282 3 5311.303730 4 5291.600829 5 3927.249558 6 3436.541789 Name: traffic_volume, dtype: float64
by_weekday['traffic_volume'].plot.line()
plt.show()
Traffic during the week tends to go up during business days and dramatically slows down during the weekends.
daytime['hour'] = daytime['date_time'].dt.hour
businessday = daytime[daytime['days']<=4]
weekend = daytime[daytime['days'] > 4]
by_hour_business = businessday.groupby('hour').mean()
by_hour_weekend = weekend.groupby('hour').mean()
by_hour_business['traffic_volume']
<ipython-input-11-af778d1063a5>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
hour 7 6030.413559 8 5503.497970 9 4895.269257 10 4378.419118 11 4633.419470 12 4855.382143 13 4859.180473 14 5152.995778 15 5592.897768 16 6189.473647 17 5784.827133 18 4434.209431 Name: traffic_volume, dtype: float64
by_hour_weekend['traffic_volume']
hour 7 1589.365894 8 2338.578073 9 3111.623917 10 3686.632302 11 4044.154955 12 4372.482883 13 4362.296564 14 4358.543796 15 4342.456881 16 4339.693805 17 4151.919929 18 3811.792279 Name: traffic_volume, dtype: float64
plt.figure(figsize=(10,6))
plt.subplot(1,2,1)
plt.plot(by_hour_business['traffic_volume'])
plt.title('Traffic during bussiness days')
plt.xlabel('Daytime hours')
plt.ylabel('Traffic volume')
plt.xlim(6,19)
plt.ylim(0,7000)
plt.subplot(1,2,2)
plt.plot(by_hour_weekend['traffic_volume'])
plt.title('Traffic during weekends')
plt.xlabel('Daytime hours')
plt.ylabel('Traffic volume')
plt.xlim(6,19)
plt.ylim(0,7000)
plt.show()
During business days rush hour is at the beginning and end of the day. During the weekend rush hour has un upward trend during the morning but then remains very stable through the afternoon.
I_94.corr()
temp | rain_1h | snow_1h | clouds_all | traffic_volume | |
---|---|---|---|---|---|
temp | 1.000000 | 0.009069 | -0.019755 | -0.101976 | 0.130299 |
rain_1h | 0.009069 | 1.000000 | -0.000090 | 0.004818 | 0.004714 |
snow_1h | -0.019755 | -0.000090 | 1.000000 | 0.027931 | 0.000733 |
clouds_all | -0.101976 | 0.004818 | 0.027931 | 1.000000 | 0.067054 |
traffic_volume | 0.130299 | 0.004714 | 0.000733 | 0.067054 | 1.000000 |
I_94.plot.scatter(x='temp', y='traffic_volume')
plt.show()
Given the low correlation between the weather variables and traffic volume, there are no visible patterns that might suggest that the weather is a factor in the volume of traffic on a given day.
by_weather_main = daytime.groupby('weather_main').mean()
by_weather_description = daytime.groupby('weather_description').mean()
plt.figure(figsize=(10,6))
by_weather_main['traffic_volume'].plot.barh()
plt.show()
In the bar plot we observe that there was no traffic volume exceeding 5,000 cars. We do see that the days with any kind of rain (Thunderstorm, Rain, Drizzle) seemed to have effect in the traffic volume.
CONCLUSIONS