I'm going to analyze the westbound traffic on the I-94 Interstate highway. It is an east–west Interstate Highway connecting the Great Lakes and northern Great Plains regions of the United States.
John Hogue made the dataset available, and you can download it from the UCI Machine Learning Repository.
The goal of our analysis is to determine a few indicators of heavy traffic on I-94. These indicators can be weather type, time of the day, time of the week, etc. For instance, we may find out that the traffic is usually heavier in the summer or when it snows.
The dataset documentation mentions that a station located approximately midway between Minneapolis and Saint Paul recorded the traffic data.
The station only records westbound traffic (cars moving from east to west).
# Import pandas library
import pandas as pd
# Creating 'df' DataFrame by using pandas.
df = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
df.head(5)
holiday | temp | rain_1h | snow_1h | clouds_all | weather_main | weather_description | date_time | traffic_volume | |
---|---|---|---|---|---|---|---|---|---|
0 | None | 288.28 | 0.0 | 0.0 | 40 | Clouds | scattered clouds | 2012-10-02 09:00:00 | 5545 |
1 | None | 289.36 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 2012-10-02 10:00:00 | 4516 |
2 | None | 289.58 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 2012-10-02 11:00:00 | 4767 |
3 | None | 290.13 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 2012-10-02 12:00:00 | 5026 |
4 | None | 291.14 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 2012-10-02 13:00:00 | 4918 |
df.tail(5)
holiday | temp | rain_1h | snow_1h | clouds_all | weather_main | weather_description | date_time | traffic_volume | |
---|---|---|---|---|---|---|---|---|---|
48199 | None | 283.45 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 2018-09-30 19:00:00 | 3543 |
48200 | None | 282.76 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 2018-09-30 20:00:00 | 2781 |
48201 | None | 282.73 | 0.0 | 0.0 | 90 | Thunderstorm | proximity thunderstorm | 2018-09-30 21:00:00 | 2159 |
48202 | None | 282.09 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 2018-09-30 22:00:00 | 1450 |
48203 | None | 282.12 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 2018-09-30 23:00:00 | 954 |
df.columns
Index(['holiday', 'temp', 'rain_1h', 'snow_1h', 'clouds_all', 'weather_main', 'weather_description', 'date_time', 'traffic_volume'], dtype='object')
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 48204 entries, 0 to 48203 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 holiday 48204 non-null object 1 temp 48204 non-null float64 2 rain_1h 48204 non-null float64 3 snow_1h 48204 non-null float64 4 clouds_all 48204 non-null int64 5 weather_main 48204 non-null object 6 weather_description 48204 non-null object 7 date_time 48204 non-null object 8 traffic_volume 48204 non-null int64 dtypes: float64(3), int64(2), object(4) memory usage: 3.3+ MB
There are total 9 columns.
Each Column has 48204 entries.
No comumn has null values.
The data spread across the period of 6 years from 2012 to 2018.
# Import matplotlib library to plot graph.
import matplotlib.pyplot as plt
%matplotlib inline
# plot histogram to see distribution of traffic volume column of DataFrame.
plt.hist(df["traffic_volume"])
plt.xlabel("Volume of Traffic")
plt.ylabel("Frequency")
plt.title("Distribution of Traffic")
plt.show()
df["traffic_volume"].describe()
count 48204.000000 mean 3259.818355 std 1986.860670 min 0.000000 25% 1193.000000 50% 3380.000000 75% 4933.000000 max 7280.000000 Name: traffic_volume, dtype: float64
About 25% of time traffic is very low around 1100 & below it.
About 25% off time traffic very high around 4000 .
There is a possibility that nighttime & daytime might influence the traffic volume.
We will divide the dataset into two parts:
- **Daytime data**: hours from 7 a.m. to 7p.m.(12 hours)
- **Nighttime data**: hours from 7 p.m. to 7 a.m. (12 hours)
# Converting 'date_time' column into date_time object.
df["date_time"] = pd.to_datetime(df["date_time"])
# Creating new column 'time' to get time(in hrs) during record of data
df["time"] = df["date_time"].dt.hour
df["time"].dtype
dtype('int64')
# Creating daytime data_set
combined = (df["time"] > 7) & (df["time"] < 19)
day_data = df[combined]
day_data.head()
holiday | temp | rain_1h | snow_1h | clouds_all | weather_main | weather_description | date_time | traffic_volume | time | |
---|---|---|---|---|---|---|---|---|---|---|
0 | None | 288.28 | 0.0 | 0.0 | 40 | Clouds | scattered clouds | 2012-10-02 09:00:00 | 5545 | 9 |
1 | None | 289.36 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 2012-10-02 10:00:00 | 4516 | 10 |
2 | None | 289.58 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 2012-10-02 11:00:00 | 4767 | 11 |
3 | None | 290.13 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 2012-10-02 12:00:00 | 5026 | 12 |
4 | None | 291.14 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 2012-10-02 13:00:00 | 4918 | 13 |
# Creating nighttime data_set
combined = (df["time"] > 7) & (df["time"] < 19)
night_data = df[~combined]
night_data.head()
holiday | temp | rain_1h | snow_1h | clouds_all | weather_main | weather_description | date_time | traffic_volume | time | |
---|---|---|---|---|---|---|---|---|---|---|
10 | None | 290.97 | 0.0 | 0.0 | 20 | Clouds | few clouds | 2012-10-02 19:00:00 | 3539 | 19 |
11 | None | 289.38 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-02 20:00:00 | 2784 | 20 |
12 | None | 288.61 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-02 21:00:00 | 2361 | 21 |
13 | None | 287.16 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-02 22:00:00 | 1529 | 22 |
14 | None | 285.45 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-02 23:00:00 | 963 | 23 |
# Defining size of canvas.
plt.figure(figsize = (10,8))
# Ceating day time plot for frequency of traffic volume.
plt.subplot(1,2,1)
plt.hist(day_data["traffic_volume"])
plt.title("Day Traffic Volume")
plt.xlabel("Volume of Traffic")
plt.ylabel("Number of Traffic")
plt.xlim(0,7000)
# Creating night time plot for frequency of traffic volume.
plt.subplot(1,2,2)
plt.hist(night_data["traffic_volume"])
plt.title("Night Traffic Volume")
plt.xlabel("Volume of Traffic")
plt.ylabel("Number of Traffic")
plt.xlim(0,7000)
plt.show()
day_data["traffic_volume"].describe()
count 21798.000000 mean 4764.132948 std 1021.369570 min 0.000000 25% 4271.000000 50% 4792.000000 75% 5410.000000 max 7280.000000 Name: traffic_volume, dtype: float64
night_data["traffic_volume"].describe()
count 26406.000000 mean 2018.015375 std 1713.201969 min 0.000000 25% 581.000000 50% 1485.000000 75% 2934.000000 max 7260.000000 Name: traffic_volume, dtype: float64
In day, maximum number of times(apprx. 50 %) traffic is between 4000 to 5000.
In night, mximum number of times(approx. 75%) traffic is less than 3000.
So, We can concludde that traffic is very light at night as compare to day time. So, for accurate analysis we should analyze day time data only.
# Creating new column 'month' to get month during record of data
day_data["month"] = day_data['date_time'].dt.month
by_month = day_data.groupby('month').mean()
by_month = by_month.reset_index()
by_month
<ipython-input-41-62ef0351e4d9>:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
month | temp | rain_1h | snow_1h | clouds_all | traffic_volume | time | |
---|---|---|---|---|---|---|---|
0 | 1 | 265.610396 | 0.015543 | 0.000692 | 58.614160 | 4499.832053 | 12.901207 |
1 | 2 | 267.190126 | 0.004091 | 0.000000 | 51.797986 | 4705.570170 | 12.873505 |
2 | 3 | 274.005655 | 0.016981 | 0.000000 | 56.890018 | 4896.060371 | 12.902570 |
3 | 4 | 280.040564 | 0.107413 | 0.000000 | 59.176874 | 4887.885428 | 13.008448 |
4 | 5 | 289.754275 | 0.138673 | 0.000000 | 57.174344 | 4901.648341 | 13.001981 |
5 | 6 | 295.068585 | 0.250047 | 0.000000 | 49.030409 | 4905.114035 | 13.000000 |
6 | 7 | 297.301008 | 4.780495 | 0.000000 | 42.222017 | 4595.017576 | 12.926457 |
7 | 8 | 295.664621 | 0.225874 | 0.000000 | 42.723892 | 4918.958227 | 12.933775 |
8 | 9 | 292.927130 | 0.276739 | 0.000000 | 45.394830 | 4870.988249 | 12.912456 |
9 | 10 | 284.455871 | 0.017481 | 0.000000 | 54.050625 | 4934.438125 | 13.035625 |
10 | 11 | 276.962591 | 0.006747 | 0.000000 | 57.025210 | 4698.226291 | 13.018607 |
11 | 12 | 267.920650 | 0.037282 | 0.002347 | 67.122122 | 4422.761261 | 12.885886 |
# Calculating variation between maximum & minimum mean traffic volume grouped
# by month.
x = by_month["traffic_volume"].max() - by_month["traffic_volume"].min()
Variation = (x*100)/(by_month["traffic_volume"].min())
print( Variation )
11.5691721418583
# Plotting line graph for month vs number of vehicle.
plt.plot(by_month["month"],by_month["traffic_volume"],color = "Blue")
plt.xlim(1,12)
plt.xlabel("Months")
plt.ylabel("Number of Vehicles")
plt.title("Month vs Traffic")
Text(0.5, 1.0, 'Month vs Traffic')
Traffic Volume is high & almost constant in Summer Months.
Traffic Volume is slightly low in Winter Months.
There is deviation from the trend in July as it shows sudden decrease in traffic volume.
The difference between the lowest and highest traffic volume averaage based on month iss 11.56 % only.
# Creating new column 'day_week' to get day of week during record of data
day_data["day_week"] = day_data["date_time"].dt.dayofweek
by_dayofweek = day_data.groupby("day_week").mean()
by_dayofweek = by_dayofweek.reset_index()
by_dayofweek
<ipython-input-44-dd921e1f494b>:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
day_week | temp | rain_1h | snow_1h | clouds_all | traffic_volume | time | month | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 282.558381 | 3.192710 | 0.000019 | 58.087024 | 4807.164138 | 12.998142 | 6.408486 |
1 | 1 | 282.637195 | 0.113711 | 0.000180 | 52.385495 | 5109.419471 | 12.869977 | 6.474028 |
2 | 2 | 282.559503 | 0.072698 | 0.001192 | 53.852568 | 5207.297083 | 12.956563 | 6.633164 |
3 | 3 | 282.620872 | 0.166149 | 0.000162 | 54.300779 | 5228.966580 | 12.984101 | 6.515574 |
4 | 4 | 282.541041 | 0.096456 | 0.000246 | 51.827497 | 5220.602140 | 12.972763 | 6.593709 |
5 | 5 | 282.694025 | 0.111649 | 0.000103 | 50.681539 | 4119.368251 | 12.916263 | 6.500485 |
6 | 6 | 282.705909 | 0.095725 | 0.000000 | 52.319871 | 3652.753150 | 12.945396 | 6.609047 |
# Calculating variation between maximum & minimum mean traffic volume grouped
# by dayofweek.
x = by_dayofweek["traffic_volume"].max() - by_dayofweek["traffic_volume"].min()
Variation = (x*100)/(by_dayofweek["traffic_volume"].min())
print( Variation )
43.15138102874187
# Plotting line graph for day of week vs number of vehicles.
plt.plot(by_dayofweek["day_week"],by_dayofweek["traffic_volume"],color = "Blue")
plt.xlim(0,6)
plt.xlabel("day_of_week")
plt.ylabel("Number of Vehicles")
plt.title("Day vs Traffic")
Text(0.5, 1.0, 'Day vs Traffic')
There is tremendous decrease in traffic volume on weekends.
On weekdays, Traffic Volume is almost constant with highest on Thrusday.
The difference between lowest and highest traffic volume average is 43.15% .
# Creating 'hour' column to get time in during record of data.
day_data['hour'] = day_data['date_time'].dt.hour
# Creating business days & weekend dataframe separately.
business_days = day_data.copy()[day_data['day_week'] <= 4]
weekend = day_data[day_data['day_week'] >= 5].copy()
# creating separate DataFrame for business days & weekends.
by_hour_business = business_days.groupby('hour').mean()
by_hour_weekend = weekend.groupby('hour').mean()
<ipython-input-47-f72c3f42cb28>:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
by_hour_business = by_hour_business.reset_index()
by_hour_business
hour | temp | rain_1h | snow_1h | clouds_all | traffic_volume | time | month | day_week | |
---|---|---|---|---|---|---|---|---|---|
0 | 8 | 278.938443 | 0.144614 | 0.000135 | 53.666441 | 5503.497970 | 8.0 | 6.567659 | 1.989175 |
1 | 9 | 279.628421 | 0.156829 | 0.000139 | 53.619709 | 4895.269257 | 9.0 | 6.484386 | 1.981263 |
2 | 10 | 280.664650 | 0.113984 | 0.000033 | 54.781417 | 4378.419118 | 10.0 | 6.481283 | 1.957888 |
3 | 11 | 281.850231 | 0.151976 | 0.000000 | 52.808876 | 4633.419470 | 11.0 | 6.448819 | 1.979957 |
4 | 12 | 282.832763 | 0.090271 | 0.001543 | 53.855714 | 4855.382143 | 12.0 | 6.569286 | 1.989286 |
5 | 13 | 283.292447 | 0.092433 | 0.000370 | 53.325444 | 4859.180473 | 13.0 | 6.465237 | 1.982988 |
6 | 14 | 284.091787 | 0.102991 | 0.000746 | 55.326531 | 5152.995778 | 14.0 | 6.588318 | 1.990852 |
7 | 15 | 284.450605 | 0.090036 | 0.000274 | 54.168467 | 5592.897768 | 15.0 | 6.541397 | 1.962563 |
8 | 16 | 284.399011 | 0.118180 | 0.000632 | 54.444132 | 6189.473647 | 16.0 | 6.580464 | 1.995081 |
9 | 17 | 284.263033 | 7.299358 | 0.000000 | 55.204960 | 5784.827133 | 17.0 | 6.510576 | 1.994165 |
10 | 18 | 284.388061 | 0.121533 | 0.000125 | 54.183079 | 4434.209431 | 18.0 | 6.529126 | 1.988211 |
by_hour_weekend = by_hour_weekend.reset_index()
by_hour_business
hour | temp | rain_1h | snow_1h | clouds_all | traffic_volume | time | month | day_week | |
---|---|---|---|---|---|---|---|---|---|
0 | 8 | 278.938443 | 0.144614 | 0.000135 | 53.666441 | 5503.497970 | 8.0 | 6.567659 | 1.989175 |
1 | 9 | 279.628421 | 0.156829 | 0.000139 | 53.619709 | 4895.269257 | 9.0 | 6.484386 | 1.981263 |
2 | 10 | 280.664650 | 0.113984 | 0.000033 | 54.781417 | 4378.419118 | 10.0 | 6.481283 | 1.957888 |
3 | 11 | 281.850231 | 0.151976 | 0.000000 | 52.808876 | 4633.419470 | 11.0 | 6.448819 | 1.979957 |
4 | 12 | 282.832763 | 0.090271 | 0.001543 | 53.855714 | 4855.382143 | 12.0 | 6.569286 | 1.989286 |
5 | 13 | 283.292447 | 0.092433 | 0.000370 | 53.325444 | 4859.180473 | 13.0 | 6.465237 | 1.982988 |
6 | 14 | 284.091787 | 0.102991 | 0.000746 | 55.326531 | 5152.995778 | 14.0 | 6.588318 | 1.990852 |
7 | 15 | 284.450605 | 0.090036 | 0.000274 | 54.168467 | 5592.897768 | 15.0 | 6.541397 | 1.962563 |
8 | 16 | 284.399011 | 0.118180 | 0.000632 | 54.444132 | 6189.473647 | 16.0 | 6.580464 | 1.995081 |
9 | 17 | 284.263033 | 7.299358 | 0.000000 | 55.204960 | 5784.827133 | 17.0 | 6.510576 | 1.994165 |
10 | 18 | 284.388061 | 0.121533 | 0.000125 | 54.183079 | 4434.209431 | 18.0 | 6.529126 | 1.988211 |
# Defining size of canvas.
plt.figure(figsize =(12,8))
# Plotting line graph for number of vehicles on business days.
plt.subplot(1,2,1)
plt.plot(by_hour_business["hour"],by_hour_business["traffic_volume"])
plt.xlabel("Hour")
plt.ylabel("traffic volume")
plt.ylim(2250,6250)
plt.title("Traffic vs Hour on Weekdays")
# Plotting line graph for number of vehicles on weekends.
plt.subplot(1,2,2)
plt.plot(by_hour_weekend["hour"],by_hour_weekend["traffic_volume"])
plt.xlabel("Hour")
plt.ylabel("traffic volume")
plt.ylim(2250,6250)
plt.title("Traffic vs Hour on Weekend")
Text(0.5, 1.0, 'Traffic vs Hour on Weekend')
As usual, traffic volume on weekdays is more than the weekend.
The business graph shows that normal business hour is between 9 am to 3 pm .
There is very high volume of traffic in the morning aand evening during business days. It is due to rush of people toward office and then coming back to home.
Generally, evening has more rush than moring in the weekdays. It may be possible that people who do freelance move out of their home in the evening.
In , weekend traffic volume increases till 12 noon and then become constant till 4 pm with a uniform decreament afterwards.
It may be due to outing of people on late moring and then return on evening.
df.corr()['traffic_volume']
temp 0.130299 rain_1h 0.004714 snow_1h 0.000733 clouds_all 0.067054 traffic_volume 1.000000 time 0.352401 Name: traffic_volume, dtype: float64
df['temp'].corr(df['traffic_volume'])
0.13029879817112658
by_temp = df.groupby('temp').mean()
by_temp = by_temp.reset_index()
by_temp
temp | rain_1h | snow_1h | clouds_all | traffic_volume | time | |
---|---|---|---|---|---|---|
0 | 0.00 | 0.0 | 0.0 | 0.0 | 1318.2 | 5.1 |
1 | 243.39 | 0.0 | 0.0 | 1.0 | 1462.0 | 8.0 |
2 | 243.62 | 0.0 | 0.0 | 1.0 | 1037.0 | 7.0 |
3 | 244.22 | 0.0 | 0.0 | 1.0 | 800.0 | 6.0 |
4 | 244.82 | 0.0 | 0.0 | 11.0 | 483.0 | 4.0 |
... | ... | ... | ... | ... | ... | ... |
5838 | 308.87 | 0.0 | 0.0 | 40.0 | 4798.0 | 18.0 |
5839 | 308.95 | 0.0 | 0.0 | 40.0 | 3812.0 | 15.0 |
5840 | 309.08 | 0.0 | 0.0 | 40.0 | 5314.0 | 17.0 |
5841 | 309.29 | 0.0 | 0.0 | 40.0 | 5902.0 | 16.0 |
5842 | 310.07 | 0.0 | 0.0 | 75.0 | 3810.0 | 16.0 |
5843 rows × 6 columns
The coorelation between al weather factors & traffic volume is very weak.
The highest correlation value is for Temp vs Traffic Vol. which is 0.130299 only.
Let's analyze Temp vs Traffic Vol. further.
#Plotting scattr plot b/w temperature vs traffic volume.
plt.scatter(by_temp["temp"],by_temp['traffic_volume'])
plt.xlabel("temp")
plt.ylabel("traffic volume")
plt.xlim(230,320)
plt.title("temp vs traffic volume")
Text(0.5, 1.0, 'temp vs traffic volume')
Since, Temp range is concentrated b/w 245 - 305 kelvin only.
Due to 50 thousand (apprx.) rows, the graph appears as a block.
Let's plot scatter graph with lesser number of rows.
# Defining size of the canvas.
plt.figure(figsize = (10,12))
# Plotting scatter plot with all rows(apprx. 6000).
plt.subplot(2,2,1)
plt.scatter(by_temp.loc[:,"temp"],by_temp.loc[:,'traffic_volume'])
plt.xlabel("temp")
plt.ylabel("traffic volume")
plt.xlim(240,320)
plt.title("temp vs traffic volume")
# Plotting scatter plot with first 2000 rows.
plt.subplot(2,2,2)
plt.scatter(by_temp.loc[0:2000,"temp"],by_temp.loc[0:2000,'traffic_volume'])
plt.xlabel("temp")
plt.ylabel("traffic volume")
plt.xlim(240,320)
plt.title("temp vs traffic volume")
#Plotting scatter plot with next 2000 rows.
plt.subplot(2,2,3)
plt.scatter(by_temp.loc[2000:4000,"temp"],by_temp.loc[2000:4000,'traffic_volume'])
plt.xlabel("temp")
plt.ylabel("traffic volume")
plt.xlim(240,320)
plt.title("temp vs traffic volume")
# Plotting scatter plot with last 2000 rows.
plt.subplot(2,2,4)
plt.scatter(by_temp.loc[4000:6000,"temp"],by_temp.loc[4000:6000,'traffic_volume'])
plt.xlabel("temp")
plt.ylabel("traffic volume")
plt.xlim(240,320)
plt.title("temp vs traffic volume")
Text(0.5, 1.0, 'temp vs traffic volume')
From the above graph, we can infer that there is uniform distribution of traffic between the 245 - 305 kelvin temperature.
More or less numerical weather column is not reliable indicator of heavy traffic.
# Cretaing Dataframe which contain temperature range b/w 245-305 kelvin.
a = day_data['temp'] > 245
b = day_data['temp'] < 305
combined = a & b
x = day_data[combined]
x
holiday | temp | rain_1h | snow_1h | clouds_all | weather_main | weather_description | date_time | traffic_volume | time | month | day_week | hour | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | None | 288.28 | 0.00 | 0.0 | 40 | Clouds | scattered clouds | 2012-10-02 09:00:00 | 5545 | 9 | 10 | 1 | 9 |
1 | None | 289.36 | 0.00 | 0.0 | 75 | Clouds | broken clouds | 2012-10-02 10:00:00 | 4516 | 10 | 10 | 1 | 10 |
2 | None | 289.58 | 0.00 | 0.0 | 90 | Clouds | overcast clouds | 2012-10-02 11:00:00 | 4767 | 11 | 10 | 1 | 11 |
3 | None | 290.13 | 0.00 | 0.0 | 90 | Clouds | overcast clouds | 2012-10-02 12:00:00 | 5026 | 12 | 10 | 1 | 12 |
4 | None | 291.14 | 0.00 | 0.0 | 75 | Clouds | broken clouds | 2012-10-02 13:00:00 | 4918 | 13 | 10 | 1 | 13 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48194 | None | 283.84 | 0.00 | 0.0 | 75 | Rain | proximity shower rain | 2018-09-30 15:00:00 | 4302 | 15 | 9 | 6 | 15 |
48195 | None | 283.84 | 0.00 | 0.0 | 75 | Drizzle | light intensity drizzle | 2018-09-30 15:00:00 | 4302 | 15 | 9 | 6 | 15 |
48196 | None | 284.38 | 0.00 | 0.0 | 75 | Rain | light rain | 2018-09-30 16:00:00 | 4283 | 16 | 9 | 6 | 16 |
48197 | None | 284.79 | 0.00 | 0.0 | 75 | Clouds | broken clouds | 2018-09-30 17:00:00 | 4132 | 17 | 9 | 6 | 17 |
48198 | None | 284.20 | 0.25 | 0.0 | 75 | Rain | light rain | 2018-09-30 18:00:00 | 3947 | 18 | 9 | 6 | 18 |
21677 rows × 13 columns
day_data.shape
(21798, 13)
So, given temperature range has only 100 (apprx.) rows less than full dataset.
# Calculating mean traffic volume grouped by 'weather_main' column.
by_weather_main = day_data.groupby('weather_main').mean()
by_weather_main = by_weather_main.reset_index()
# Plotting horizontal bar graph for traffic volume vs weather type.
plt.barh(by_weather_main["weather_main"],by_weather_main["traffic_volume"])
<BarContainer object of 11 artists>
The Weather Type doesn't bring any significant changes to the traffic volume.
However, Squall & Fog reduce the traffic volume near to 4000.
Fog has most negative impacted traffic volume among all factors. It may be due to vague visibility in atmosphere during fog.
Snow, Mist & Haze has also shown minor negative changes in traffic volume.
# Calculating mean traffic volume grouped by 'weather_description' column.
by_weather_description = day_data.groupby('weather_description').mean()
by_weather_description = by_weather_description.reset_index()
# Plotting horizontal bar graph for traffic volume vs weather description.
plt.figure(figsize = (10,20))
plt.barh(by_weather_description["weather_description"],by_weather_description["traffic_volume"])
plt.yticks(fontsize=14)
plt.show()
- The Weather Description doesn't bring significant changes to the traffic volume except during
**Thunderstorm with Drizzle** which reduces the traffic volume close to **2000**.
Other negative factors mainly include: Thunderstorm With Rain, Thunderstorm With Light Rain, Snow, Sleet, Proximity Thunderstorm With Rain, Light Snow, Mist, Light Shower Snow, Heavy snow, llight Intensity Shower Rain, Fog, Freezing Rain, Squalls.
Some bad weather description has tend to increase the traffic volume mainly includes: Heavy Rain, Shower Drizzle, Proximity Thunderstorm with Drizzle, Light Rain & Snow.
It may be due to fact that these weather condition are not too bad. So, people want to travel but with car rather other means such as bike or by walk or waiting for bus.
In this project, we tried to find a few indicators of heavy traffic on the I-94 Interstate highway. We managed to find two types of indicators:
Time indicators
The traffic is usually heavier during warm months (March–October) compared to cold months (November–February).
The traffic is usually heavier on business days compared to the weekends.
On business days, the rush hours are around 9 and 15.
Weather indicators
Heavy Rain
Shower Drizzle
Light rain and snow
Proximity thunderstorm with drizzle