As a part of this assignment, a data product was built to assist a real estate company to understand which zip codes are profitable for short term rentals within New York City. They have already decided to invest in Two Bedroom houses as they are found to be profitable. For this purpose, publicly available data from Zillow and AirBnB were used.
1.The investor will pay for the property in cash (i.e. no mortgage/interest rate will need to be accounted for).
2.The time value of money discount rate is 0% (i.e 1 dollar today is worth the same 100 years from now).
3.All properties and all square feet within each locale can be assumed to be homogeneous (i.e. a 1000 square foot property in a locale such as Bronx or Manhattan generates twice the revenue and costs twice as much as any other 500 square foot property within that same locale).
4.The occupancy rate is assumed to be 75%.
5.It was assumed that the real estate company wants to buy the property in April 2019, so I had performed time series forecasting to obtain the value of property in April 2019 considering the data from January 2010.
6.Since the data before January 2010 were showing abnormal pattern and also since most of them had missing values, the data from January 2010 were used.
7.Availability for 90 days was considered during the availability analysis oowing to the fact availability for 365 days were incorrect for some zipcodes.
8.Time for calculating the Breakeven Period was taken as 365 days(neglecting leap year) and thus the resultant Breakeven Period will be in years.
9.While performing time series analysis, models were not built step by step for each zipcodes because of obvious time constraints and considering the scope of the project. Thus AutoARIMA was used to do Model fitting and forecasting for all the individual zipcodes.
The project addressed the following quality checks,
• Checking for Duplicates
• Checking for Negative values
• Checking the Columns with Null Values
The project addressed the following quality checks,
• Cleaning up the Zillow Dataset.
• Cleaning up the Airbnb dataset.
• Imputing the missing values with group mean.
• Imputing some of the 0 values with the group mean.
• Filtering the Airbnb dataset for New York and 2 Bedrooms apartment.
• Grouping the Airbnb data by zipcode and taking mean.
• Joining the cleaned final Airbnb and Zillow dataset based on matching Zipcodes.
• Current Month Cost-April 2019(Calculated from Time series Forecasting). Also the cost for other months preceding Current Month were also added but they were not used anywhere.
• Neighbourhood group cleansed(Extracted from Airbnb data to add to final dataframe).
• Expected Revenue(Calulated Field).
• Breakeven Period(Calculated Field).
Here the Zillow dataset contains the Time series of cost data and the data from January 2010 was taken to fit the Time series model and forecast the cost values upto the Current Month Cost which is April 2019 for all the 21 Zipcodes. Here for this purpose the Pyamid ARIMA package was used to perform the AutoARIMA and get the forecast values.
The following plots and analysis was carried out in this project,
• Visualizing the Zipcodes in Real Time.
• Preliminary Review Analysis:
Review Accuracy vs Reviews Ratings
Reviews Ratings Vs No of Reviews Vs Neighbourhood
• Analysis of Availability:
Bar plots of Availability for all Zipcodes
Availability- Revenue Analysis
Availability- Cost Analysis
• Correlation between Revenue, Cost, Reviews and Availability
• Distribution of Revenue and Cost
Violin Plot of Revenue
Violin Plot of Cost
• Revenue-Cost Analysis
• Break Even Analysis
• 3-D Scatter Plot Between Revenue, Cost and Breakeven period
#Importing the Libraries
import pandas as pd
import numpy as np
# Matplotlib is for plotting graphs
import matplotlib.pyplot as plt
# matplotlib inline will display the graphs right below the cell
%matplotlib inline
# Plotly is used for Interactive Visualization
import plotly.plotly as py
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
# Importing the Respective Datasets
airbnb = pd.read_csv("listings.csv")
zillow = pd.read_csv("Zip_Zhvi_2bedroom.csv")
C:\Users\shari\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2698: DtypeWarning: Columns (43,88) have mixed types. Specify dtype option on import or set low_memory=False.
airbnb.shape
(40753, 95)
zillow.shape
(8946, 262)
# Function to return all the Columns in a List Format:
def cols(dataset):
return(dataset.columns.tolist())
cols(airbnb)
['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'summary', 'space', 'description', 'experiences_offered', 'neighborhood_overview', 'notes', 'transit', 'access', 'interaction', 'house_rules', 'thumbnail_url', 'medium_url', 'picture_url', 'xl_picture_url', 'host_id', 'host_url', 'host_name', 'host_since', 'host_location', 'host_about', 'host_response_time', 'host_response_rate', 'host_acceptance_rate', 'host_is_superhost', 'host_thumbnail_url', 'host_picture_url', 'host_neighbourhood', 'host_listings_count', 'host_total_listings_count', 'host_verifications', 'host_has_profile_pic', 'host_identity_verified', 'street', 'neighbourhood', 'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'city', 'state', 'zipcode', 'market', 'smart_location', 'country_code', 'country', 'latitude', 'longitude', 'is_location_exact', 'property_type', 'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds', 'bed_type', 'amenities', 'square_feet', 'price', 'weekly_price', 'monthly_price', 'security_deposit', 'cleaning_fee', 'guests_included', 'extra_people', 'minimum_nights', 'maximum_nights', 'calendar_updated', 'has_availability', 'availability_30', 'availability_60', 'availability_90', 'availability_365', 'calendar_last_scraped', 'number_of_reviews', 'first_review', 'last_review', 'review_scores_rating', 'review_scores_accuracy', 'review_scores_cleanliness', 'review_scores_checkin', 'review_scores_communication', 'review_scores_location', 'review_scores_value', 'requires_license', 'license', 'jurisdiction_names', 'instant_bookable', 'cancellation_policy', 'require_guest_profile_picture', 'require_guest_phone_verification', 'calculated_host_listings_count', 'reviews_per_month']
cols(zillow)
['RegionID', 'RegionName', 'City', 'State', 'Metro', 'CountyName', 'SizeRank', '1996-04', '1996-05', '1996-06', '1996-07', '1996-08', '1996-09', '1996-10', '1996-11', '1996-12', '1997-01', '1997-02', '1997-03', '1997-04', '1997-05', '1997-06', '1997-07', '1997-08', '1997-09', '1997-10', '1997-11', '1997-12', '1998-01', '1998-02', '1998-03', '1998-04', '1998-05', '1998-06', '1998-07', '1998-08', '1998-09', '1998-10', '1998-11', '1998-12', '1999-01', '1999-02', '1999-03', '1999-04', '1999-05', '1999-06', '1999-07', '1999-08', '1999-09', '1999-10', '1999-11', '1999-12', '2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06', '2000-07', '2000-08', '2000-09', '2000-10', '2000-11', '2000-12', '2001-01', '2001-02', '2001-03', '2001-04', '2001-05', '2001-06', '2001-07', '2001-08', '2001-09', '2001-10', '2001-11', '2001-12', '2002-01', '2002-02', '2002-03', '2002-04', '2002-05', '2002-06', '2002-07', '2002-08', '2002-09', '2002-10', '2002-11', '2002-12', '2003-01', '2003-02', '2003-03', '2003-04', '2003-05', '2003-06', '2003-07', '2003-08', '2003-09', '2003-10', '2003-11', '2003-12', '2004-01', '2004-02', '2004-03', '2004-04', '2004-05', '2004-06', '2004-07', '2004-08', '2004-09', '2004-10', '2004-11', '2004-12', '2005-01', '2005-02', '2005-03', '2005-04', '2005-05', '2005-06', '2005-07', '2005-08', '2005-09', '2005-10', '2005-11', '2005-12', '2006-01', '2006-02', '2006-03', '2006-04', '2006-05', '2006-06', '2006-07', '2006-08', '2006-09', '2006-10', '2006-11', '2006-12', '2007-01', '2007-02', '2007-03', '2007-04', '2007-05', '2007-06', '2007-07', '2007-08', '2007-09', '2007-10', '2007-11', '2007-12', '2008-01', '2008-02', '2008-03', '2008-04', '2008-05', '2008-06', '2008-07', '2008-08', '2008-09', '2008-10', '2008-11', '2008-12', '2009-01', '2009-02', '2009-03', '2009-04', '2009-05', '2009-06', '2009-07', '2009-08', '2009-09', '2009-10', '2009-11', '2009-12', '2010-01', '2010-02', '2010-03', '2010-04', '2010-05', '2010-06', '2010-07', '2010-08', '2010-09', '2010-10', '2010-11', '2010-12', '2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06', '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12', '2012-01', '2012-02', '2012-03', '2012-04', '2012-05', '2012-06', '2012-07', '2012-08', '2012-09', '2012-10', '2012-11', '2012-12', '2013-01', '2013-02', '2013-03', '2013-04', '2013-05', '2013-06', '2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12', '2014-01', '2014-02', '2014-03', '2014-04', '2014-05', '2014-06', '2014-07', '2014-08', '2014-09', '2014-10', '2014-11', '2014-12', '2015-01', '2015-02', '2015-03', '2015-04', '2015-05', '2015-06', '2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12', '2016-01', '2016-02', '2016-03', '2016-04', '2016-05', '2016-06', '2016-07', '2016-08', '2016-09', '2016-10', '2016-11', '2016-12', '2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06']
# Function to determine the Missing Value Percent in every Column of a Dataset
def mis_value_percent(dataset):
return(100* dataset.isnull().sum()/ len(dataset))
mis_value_percent(zillow)
RegionID 0.000000 RegionName 0.000000 City 0.000000 State 0.000000 Metro 2.794545 CountyName 0.000000 SizeRank 0.000000 1996-04 29.756316 1996-05 28.862061 1996-06 28.862061 1996-07 28.806170 1996-08 28.794992 1996-09 28.794992 1996-10 28.794992 1996-11 28.683210 1996-12 28.683210 1997-01 28.414934 1997-02 23.619495 1997-03 23.395931 1997-04 23.395931 1997-05 23.395931 1997-06 23.373575 1997-07 23.373575 1997-08 22.289291 1997-09 22.255757 1997-10 22.255757 1997-11 22.222222 1997-12 22.177510 1998-01 21.987480 1998-02 20.366644 ... 2015-01 0.000000 2015-02 0.000000 2015-03 0.000000 2015-04 0.000000 2015-05 0.000000 2015-06 0.000000 2015-07 0.000000 2015-08 0.000000 2015-09 0.000000 2015-10 0.000000 2015-11 0.000000 2015-12 0.000000 2016-01 0.000000 2016-02 0.201207 2016-03 0.201207 2016-04 0.201207 2016-05 0.201207 2016-06 0.000000 2016-07 0.000000 2016-08 0.000000 2016-09 0.000000 2016-10 0.000000 2016-11 0.000000 2016-12 0.033535 2017-01 0.000000 2017-02 0.000000 2017-03 0.000000 2017-04 0.000000 2017-05 0.000000 2017-06 0.000000 Length: 262, dtype: float64
airbnb.shape
(40753, 95)
airbnb.drop_duplicates().shape
(40753, 95)
No duplicates are found in AirBnB data as the dimensions didn't change
zillow.shape
(8946, 262)
zillow.drop_duplicates().shape
(8946, 262)
Similarly no duplicate rows are found in Zillow dataset
airbnb[(airbnb < 0).all(1)]
id | listing_url | scrape_id | last_scraped | name | summary | space | description | experiences_offered | neighborhood_overview | ... | review_scores_value | requires_license | license | jurisdiction_names | instant_bookable | cancellation_policy | require_guest_profile_picture | require_guest_phone_verification | calculated_host_listings_count | reviews_per_month |
---|
0 rows × 95 columns
zillow[(zillow < 0).all(1)]
RegionID | RegionName | City | State | Metro | CountyName | SizeRank | 1996-04 | 1996-05 | 1996-06 | ... | 2016-09 | 2016-10 | 2016-11 | 2016-12 | 2017-01 | 2017-02 | 2017-03 | 2017-04 | 2017-05 | 2017-06 |
---|
0 rows × 262 columns
No negative values are found in both the dataframes
zillow.City.unique()
array(['New York', 'Chicago', 'El Paso', ..., "Hart'S Location", 'Waterville Valley', 'Shelburne Falls'], dtype=object)
zillow.RegionName
0 10025 1 60657 2 10023 3 60614 4 79936 5 60640 6 94109 7 77494 8 32162 9 60647 10 37211 11 37013 12 60618 13 10128 14 10011 15 28269 16 77573 17 30349 18 90046 19 77584 20 10003 21 60613 22 78660 23 75052 24 20009 25 20002 26 21234 27 63376 28 30044 29 60629 ... 8916 47640 8917 92333 8918 3605 8919 47968 8920 47597 8921 80515 8922 3282 8923 55781 8924 96148 8925 3457 8926 80456 8927 80510 8928 1255 8929 95497 8930 55053 8931 95728 8932 97149 8933 3279 8934 12480 8935 80481 8936 1270 8937 3812 8938 3215 8939 47965 8940 3765 8941 80532 8942 12429 8943 97028 8944 1338 8945 3293 Name: RegionName, Length: 8946, dtype: int64
# Cleaning the Zillow Data:
def zillow_clean(dataset, city_of_interest):
#Filtering Out the City Based on the Requirement
dataset=dataset[dataset.City==city_of_interest]
# Renaming the Column Name "RegionName" to "zipcode"
dataset.rename(columns= {"RegionName": "zipcode"}, inplace= True)
# Converting the zipcode column to integer format
dataset['zipcode']= dataset['zipcode'].astype(int)
return dataset
# Using the Function and Storing it as zillow
zillow=zillow_clean(zillow, "New York")
C:\Users\shari\Anaconda3\lib\site-packages\pandas\core\frame.py:2746: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy C:\Users\shari\Anaconda3\lib\site-packages\ipykernel_launcher.py:12: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
zillow.shape
(25, 262)
mis_value_percent(airbnb)
id 0.000000 listing_url 0.000000 scrape_id 0.000000 last_scraped 0.000000 name 0.085883 summary 3.962898 space 31.585405 description 0.046622 experiences_offered 0.000000 neighborhood_overview 42.018992 notes 64.157240 transit 37.526072 access 42.136775 interaction 44.845778 house_rules 36.399774 thumbnail_url 21.142002 medium_url 21.142002 picture_url 0.000000 xl_picture_url 21.142002 host_id 0.000000 host_url 0.000000 host_name 0.615906 host_since 0.615906 host_location 1.020784 host_about 38.505141 host_response_time 32.247933 host_response_rate 32.247933 host_acceptance_rate 100.000000 host_is_superhost 0.615906 host_thumbnail_url 0.615906 ... guests_included 0.000000 extra_people 0.000000 minimum_nights 0.000000 maximum_nights 0.000000 calendar_updated 0.000000 has_availability 100.000000 availability_30 0.000000 availability_60 0.000000 availability_90 0.000000 availability_365 0.000000 calendar_last_scraped 0.000000 number_of_reviews 0.000000 first_review 22.817952 last_review 22.538218 review_scores_rating 24.351581 review_scores_accuracy 24.579786 review_scores_cleanliness 24.501264 review_scores_checkin 24.626408 review_scores_communication 24.503718 review_scores_location 24.628862 review_scores_value 24.633769 requires_license 0.000000 license 100.000000 jurisdiction_names 99.980370 instant_bookable 0.000000 cancellation_policy 0.000000 require_guest_profile_picture 0.000000 require_guest_phone_verification 0.000000 calculated_host_listings_count 0.000000 reviews_per_month 22.817952 Length: 95, dtype: float64
airbnb.shape
(40753, 95)
# Function to clean up the Airbnb Dataset and Filter it based on the Number of Bedrooms
def airbnb_clean(dataset, number_of_bedrooms):
# Remove the Dollar Sign in the Price Column
dataset['price']=dataset['price'].replace( '[\$,)]','', regex=True ).astype(float)
# Some rows have their postal code along with the zip code and we will be needing only the Zipcode(5 numbers alone)
dataset['zipcode']= dataset['zipcode'].str[:5]
# Removing the Nan values from Zipcodes
dataset['zipcode'] = pd.to_numeric(dataset['zipcode'], errors='coerce')
dataset.dropna(subset=['zipcode'], inplace=True)
dataset['zipcode']= dataset['zipcode'].astype(int)
# Here data cleaning was performed on Reviews below since they are used further down the analyses
# Converting the 0 values in the number_of_reviews with their group(zipcode) mean values
dataset['number_of_reviews']=dataset.groupby('zipcode')['number_of_reviews'].transform(lambda x: x.replace(0,x.mean()))
# Removing the NA values in those 2 columns and replacing with their group(zipcode) mean values
dataset[['review_scores_rating', 'review_scores_accuracy']]=dataset.groupby('zipcode')['review_scores_rating', 'review_scores_accuracy'].transform(lambda x: x.fillna(x.mean()))
# Filter by the Number of Bedrooms
dataset= dataset.loc[dataset.bedrooms ==number_of_bedrooms]
return dataset
# Using the above function to Filter out the 2 Bedrooms data and store it in airbnb
airbnb= airbnb_clean(airbnb,2)
# Resulting Shape
airbnb.shape
(3882, 95)
# Checking if zipcode is converted to Integer Format
airbnb.dtypes
id int64 listing_url object scrape_id float64 last_scraped object name object summary object space object description object experiences_offered object neighborhood_overview object notes object transit object access object interaction object house_rules object thumbnail_url object medium_url object picture_url object xl_picture_url object host_id int64 host_url object host_name object host_since object host_location object host_about object host_response_time object host_response_rate object host_acceptance_rate float64 host_is_superhost object host_thumbnail_url object ... guests_included int64 extra_people object minimum_nights int64 maximum_nights int64 calendar_updated object has_availability float64 availability_30 int64 availability_60 int64 availability_90 int64 availability_365 int64 calendar_last_scraped object number_of_reviews float64 first_review object last_review object review_scores_rating float64 review_scores_accuracy float64 review_scores_cleanliness float64 review_scores_checkin float64 review_scores_communication float64 review_scores_location float64 review_scores_value float64 requires_license object license float64 jurisdiction_names object instant_bookable object cancellation_policy object require_guest_profile_picture object require_guest_phone_verification object calculated_host_listings_count int64 reviews_per_month float64 Length: 95, dtype: object
# Checking if Zipcode is in correct format
airbnb.zipcode.unique()
array([10462, 10469, 11102, 11105, 11370, 11417, 11416, 10458, 10305, 11692, 11103, 11106, 11101, 11356, 11228, 10006, 10004, 10280, 11209, 11220, 11360, 11233, 11221, 11216, 11213, 11238, 11205, 11206, 11237, 11694, 11001, 11426, 11214, 11204, 11201, 11217, 11218, 11219, 11235, 11224, 11225, 11226, 11203, 10455, 10454, 11212, 11207, 11385, 10002, 10013, 10038, 11236, 11231, 10011, 10010, 10001, 10003, 10014, 10036, 10473, 10304, 10451, 10452, 11368, 10465, 11208, 11369, 11210, 10029, 10026, 10035, 10037, 10128, 10027, 10459, 11373, 10012, 10009, 10016, 10475, 11691, 10270, 10005, 11230, 11234, 11358, 11354, 10018, 11109, 10463, 10019, 10022, 10017, 10306, 11378, 11377, 11379, 10025, 10467, 10301, 11215, 10461, 11374, 11418, 11693, 10044, 7310, 10312, 11422, 11229, 11223, 11420, 11434, 11412, 11104, 10460, 11232, 10111, 10007, 10028, 10065, 10021, 10075, 10162, 10024, 10023, 10069, 10466, 10032, 10033, 10040, 10310, 11211, 11249, 11222], dtype=int64)
mis_value_percent(airbnb)
id 0.000000 listing_url 0.000000 scrape_id 0.000000 last_scraped 0.000000 name 0.025760 summary 4.070067 space 25.837197 description 0.025760 experiences_offered 0.000000 neighborhood_overview 38.021638 notes 61.746522 transit 34.389490 access 40.340031 interaction 42.606904 house_rules 31.942298 thumbnail_url 24.343122 medium_url 24.343122 picture_url 0.000000 xl_picture_url 24.343122 host_id 0.000000 host_url 0.000000 host_name 0.283359 host_since 0.283359 host_location 0.489438 host_about 34.209171 host_response_time 27.408552 host_response_rate 27.408552 host_acceptance_rate 100.000000 host_is_superhost 0.283359 host_thumbnail_url 0.283359 ... guests_included 0.000000 extra_people 0.000000 minimum_nights 0.000000 maximum_nights 0.000000 calendar_updated 0.000000 has_availability 100.000000 availability_30 0.000000 availability_60 0.000000 availability_90 0.000000 availability_365 0.000000 calendar_last_scraped 0.000000 number_of_reviews 0.000000 first_review 19.809377 last_review 19.500258 review_scores_rating 0.025760 review_scores_accuracy 0.025760 review_scores_cleanliness 20.942813 review_scores_checkin 21.045853 review_scores_communication 20.968573 review_scores_location 21.045853 review_scores_value 21.045853 requires_license 0.000000 license 100.000000 jurisdiction_names 99.974240 instant_bookable 0.000000 cancellation_policy 0.000000 require_guest_profile_picture 0.000000 require_guest_phone_verification 0.000000 calculated_host_listings_count 0.000000 reviews_per_month 19.809377 Length: 95, dtype: float64
# Function to basically group the Airbnb based on Zipcodes and then Join with the Zillow Data
def group_join(x,y):
# Since our Airbnb has lot of properties in each Zipcodes- Grouping by the Zipcode and taking the mean
airbnb_meandf=x.groupby('zipcode',as_index=False).mean()
# Joining this groupped by Dataframe with the y dataframe which will be Zillow Dataframe
final=pd.merge(airbnb_meandf, y, on='zipcode')
return final
# Storing it in a Variable Called "df"
df= group_join(airbnb, zillow)
df
zipcode | id | scrape_id | host_id | host_acceptance_rate | host_listings_count | host_total_listings_count | latitude | longitude | accommodates | ... | 2016-09 | 2016-10 | 2016-11 | 2016-12 | 2017-01 | 2017-02 | 2017-03 | 2017-04 | 2017-05 | 2017-06 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10003 | 9.124661e+06 | 2.017050e+13 | 2.496037e+07 | NaN | 1.646018 | 1.646018 | 40.728902 | -73.988258 | 4.566372 | ... | 1932800 | 1930400 | 1937500 | 1935100.0 | 1915700 | 1916500 | 1965700 | 2045300 | 2109100 | 2147000 |
1 | 10011 | 9.889107e+06 | 2.017050e+13 | 3.005599e+07 | NaN | 1.448276 | 1.448276 | 40.742371 | -73.999752 | 4.517241 | ... | 2354000 | 2355500 | 2352200 | 2332100.0 | 2313300 | 2319600 | 2342100 | 2365900 | 2419700 | 2480400 |
2 | 10013 | 1.054726e+07 | 2.017050e+13 | 2.783653e+07 | NaN | 1.412698 | 1.412698 | 40.719555 | -74.004075 | 4.281250 | ... | 3347100 | 3305500 | 3261100 | 3244000.0 | 3231400 | 3183300 | 3160200 | 3193500 | 3262200 | 3316500 |
3 | 10014 | 8.654217e+06 | 2.017050e+13 | 2.054620e+07 | NaN | 1.537634 | 1.537634 | 40.734209 | -74.004240 | 4.344086 | ... | 2480800 | 2443200 | 2430100 | 2452900.0 | 2451200 | 2441900 | 2460900 | 2494900 | 2498400 | 2491600 |
4 | 10021 | 1.022891e+07 | 2.017050e+13 | 3.568867e+07 | NaN | 1.526316 | 1.526316 | 40.768441 | -73.956628 | 4.263158 | ... | 1668300 | 1664800 | 1659500 | 1656000.0 | 1643700 | 1649800 | 1688500 | 1731400 | 1774100 | 1815600 |
5 | 10022 | 9.963782e+06 | 2.017050e+13 | 4.129974e+07 | NaN | 1.394737 | 1.394737 | 40.758886 | -73.966700 | 5.447368 | ... | 1773200 | 1760600 | 1762000 | 1776800.0 | 1775200 | 1774500 | 1812700 | 1914600 | 1997800 | 2031600 |
6 | 10023 | 8.424752e+06 | 2.017050e+13 | 3.053048e+07 | NaN | 2.888889 | 2.888889 | 40.777581 | -73.981486 | 4.666667 | ... | 1993500 | 1980700 | 1960900 | 1951300.0 | 1937800 | 1929800 | 1955000 | 2022400 | 2095000 | 2142300 |
7 | 10025 | 7.849437e+06 | 2.017050e+13 | 2.567103e+07 | NaN | 1.571429 | 1.571429 | 40.798025 | -73.966355 | 4.821429 | ... | 1374400 | 1364100 | 1366300 | 1354800.0 | 1327500 | 1317300 | 1333700 | 1352100 | 1390000 | 1431000 |
8 | 10028 | 1.038785e+07 | 2.017050e+13 | 3.790560e+07 | NaN | 1.375000 | 1.375000 | 40.775367 | -73.950934 | 4.281250 | ... | 1814600 | 1820400 | 1810000 | 1786500.0 | 1787100 | 1801100 | 1847900 | 1922800 | 2015700 | 2083900 |
9 | 10036 | 8.401201e+06 | 2.017050e+13 | 3.853206e+07 | NaN | 73.500000 | 73.500000 | 40.754810 | -73.987824 | 4.750000 | ... | 1766400 | 1749800 | 1744100 | 1737900.0 | 1722800 | 1736600 | 1750100 | 1735500 | 1717100 | 1712900 |
10 | 10128 | 1.079017e+07 | 2.017050e+13 | 2.984589e+07 | NaN | 1.321429 | 1.321429 | 40.781842 | -73.949537 | 4.357143 | ... | 1526000 | 1523700 | 1527200 | 1541600.0 | 1557800 | 1582900 | 1598900 | 1646100 | 1720500 | 1787100 |
11 | 10304 | 1.590640e+06 | 2.017050e+13 | 7.534770e+06 | NaN | 1.000000 | 1.000000 | 40.626721 | -74.076721 | 3.500000 | ... | 284500 | 287100 | 290700 | 294700.0 | 299000 | 303800 | 308300 | 313800 | 321300 | 328300 |
12 | 10305 | 1.368595e+07 | 2.017050e+13 | 5.535834e+07 | NaN | 1.555556 | 1.555556 | 40.596360 | -74.077371 | 4.555556 | ... | 373500 | 378600 | 382700 | 385300.0 | 387900 | 394500 | 403800 | 412000 | 418900 | 425100 |
13 | 10306 | 8.328929e+06 | 2.017050e+13 | 5.237069e+07 | NaN | 2.000000 | 2.000000 | 40.568879 | -74.111565 | 5.500000 | ... | 316100 | 318300 | 321200 | 322900.0 | 325400 | 331700 | 339000 | 343500 | 347800 | 352900 |
14 | 10312 | 1.800434e+07 | 2.017050e+13 | 1.197988e+08 | NaN | 1.000000 | 1.000000 | 40.546331 | -74.183701 | 2.000000 | ... | 325400 | 326600 | 328400 | 330300.0 | 333600 | 339300 | 345000 | 348400 | 351700 | 355000 |
15 | 11201 | 9.298929e+06 | 2.017050e+13 | 2.327364e+07 | NaN | 1.451613 | 1.451613 | 40.692611 | -73.990576 | 4.451613 | ... | 1340200 | 1338700 | 1350600 | 1375600.0 | 1390200 | 1398100 | 1399900 | 1400500 | 1407300 | 1420700 |
16 | 11215 | 9.035422e+06 | 2.017050e+13 | 2.460656e+07 | NaN | 1.422764 | 1.422764 | 40.666241 | -73.984527 | 4.266129 | ... | 982200 | 992000 | 1002600 | 1021400.0 | 1039300 | 1040800 | 1035800 | 1050000 | 1066200 | 1070800 |
17 | 11217 | 6.866520e+06 | 2.017050e+13 | 1.988437e+07 | NaN | 1.283333 | 1.283333 | 40.681377 | -73.979068 | 3.983333 | ... | 1207600 | 1209800 | 1216200 | 1221400.0 | 1225600 | 1225000 | 1219600 | 1238100 | 1276400 | 1302300 |
18 | 11231 | 7.697324e+06 | 2.017050e+13 | 2.171201e+07 | NaN | 1.550725 | 1.550725 | 40.680632 | -74.000049 | 4.391304 | ... | 1128000 | 1152400 | 1166000 | 1166500.0 | 1166600 | 1182900 | 1202200 | 1213200 | 1211700 | 1202900 |
19 | 11234 | 1.066553e+07 | 2.017050e+13 | 2.752097e+07 | NaN | 1.333333 | 1.333333 | 40.623074 | -73.925943 | 4.166667 | ... | 448400 | 452300 | 457200 | 461200.0 | 464400 | 466400 | 470900 | 475700 | 477100 | 476900 |
20 | 11434 | 9.374263e+06 | 2.017050e+13 | 4.863844e+07 | NaN | 1.000000 | 1.000000 | 40.662033 | -73.769800 | 4.000000 | ... | 342100 | 348300 | 356100 | 358400.0 | 357500 | 361900 | 370800 | 377100 | 380200 | 382300 |
21 rows × 295 columns
cols(df)
['zipcode', 'id', 'scrape_id', 'host_id', 'host_acceptance_rate', 'host_listings_count', 'host_total_listings_count', 'latitude', 'longitude', 'accommodates', 'bathrooms', 'bedrooms', 'beds', 'square_feet', 'price', 'guests_included', 'minimum_nights', 'maximum_nights', 'has_availability', 'availability_30', 'availability_60', 'availability_90', 'availability_365', 'number_of_reviews', 'review_scores_rating', 'review_scores_accuracy', 'review_scores_cleanliness', 'review_scores_checkin', 'review_scores_communication', 'review_scores_location', 'review_scores_value', 'license', 'calculated_host_listings_count', 'reviews_per_month', 'RegionID', 'City', 'State', 'Metro', 'CountyName', 'SizeRank', '1996-04', '1996-05', '1996-06', '1996-07', '1996-08', '1996-09', '1996-10', '1996-11', '1996-12', '1997-01', '1997-02', '1997-03', '1997-04', '1997-05', '1997-06', '1997-07', '1997-08', '1997-09', '1997-10', '1997-11', '1997-12', '1998-01', '1998-02', '1998-03', '1998-04', '1998-05', '1998-06', '1998-07', '1998-08', '1998-09', '1998-10', '1998-11', '1998-12', '1999-01', '1999-02', '1999-03', '1999-04', '1999-05', '1999-06', '1999-07', '1999-08', '1999-09', '1999-10', '1999-11', '1999-12', '2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06', '2000-07', '2000-08', '2000-09', '2000-10', '2000-11', '2000-12', '2001-01', '2001-02', '2001-03', '2001-04', '2001-05', '2001-06', '2001-07', '2001-08', '2001-09', '2001-10', '2001-11', '2001-12', '2002-01', '2002-02', '2002-03', '2002-04', '2002-05', '2002-06', '2002-07', '2002-08', '2002-09', '2002-10', '2002-11', '2002-12', '2003-01', '2003-02', '2003-03', '2003-04', '2003-05', '2003-06', '2003-07', '2003-08', '2003-09', '2003-10', '2003-11', '2003-12', '2004-01', '2004-02', '2004-03', '2004-04', '2004-05', '2004-06', '2004-07', '2004-08', '2004-09', '2004-10', '2004-11', '2004-12', '2005-01', '2005-02', '2005-03', '2005-04', '2005-05', '2005-06', '2005-07', '2005-08', '2005-09', '2005-10', '2005-11', '2005-12', '2006-01', '2006-02', '2006-03', '2006-04', '2006-05', '2006-06', '2006-07', '2006-08', '2006-09', '2006-10', '2006-11', '2006-12', '2007-01', '2007-02', '2007-03', '2007-04', '2007-05', '2007-06', '2007-07', '2007-08', '2007-09', '2007-10', '2007-11', '2007-12', '2008-01', '2008-02', '2008-03', '2008-04', '2008-05', '2008-06', '2008-07', '2008-08', '2008-09', '2008-10', '2008-11', '2008-12', '2009-01', '2009-02', '2009-03', '2009-04', '2009-05', '2009-06', '2009-07', '2009-08', '2009-09', '2009-10', '2009-11', '2009-12', '2010-01', '2010-02', '2010-03', '2010-04', '2010-05', '2010-06', '2010-07', '2010-08', '2010-09', '2010-10', '2010-11', '2010-12', '2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06', '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12', '2012-01', '2012-02', '2012-03', '2012-04', '2012-05', '2012-06', '2012-07', '2012-08', '2012-09', '2012-10', '2012-11', '2012-12', '2013-01', '2013-02', '2013-03', '2013-04', '2013-05', '2013-06', '2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12', '2014-01', '2014-02', '2014-03', '2014-04', '2014-05', '2014-06', '2014-07', '2014-08', '2014-09', '2014-10', '2014-11', '2014-12', '2015-01', '2015-02', '2015-03', '2015-04', '2015-05', '2015-06', '2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12', '2016-01', '2016-02', '2016-03', '2016-04', '2016-05', '2016-06', '2016-07', '2016-08', '2016-09', '2016-10', '2016-11', '2016-12', '2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06']
# Zillow data has time series and thus extracting the time series data to perform Time Series analysis and Forecasting
timeseriesdf= df.ix[:,40:]
timeseriesdf.shape
C:\Users\shari\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing See the documentation here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
(21, 255)
timeseriesdf.head()
1996-04 | 1996-05 | 1996-06 | 1996-07 | 1996-08 | 1996-09 | 1996-10 | 1996-11 | 1996-12 | 1997-01 | ... | 2016-09 | 2016-10 | 2016-11 | 2016-12 | 2017-01 | 2017-02 | 2017-03 | 2017-04 | 2017-05 | 2017-06 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1932800 | 1930400 | 1937500 | 1935100.0 | 1915700 | 1916500 | 1965700 | 2045300 | 2109100 | 2147000 |
1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 2354000 | 2355500 | 2352200 | 2332100.0 | 2313300 | 2319600 | 2342100 | 2365900 | 2419700 | 2480400 |
2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 3347100 | 3305500 | 3261100 | 3244000.0 | 3231400 | 3183300 | 3160200 | 3193500 | 3262200 | 3316500 |
3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 2480800 | 2443200 | 2430100 | 2452900.0 | 2451200 | 2441900 | 2460900 | 2494900 | 2498400 | 2491600 |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1668300 | 1664800 | 1659500 | 1656000.0 | 1643700 | 1649800 | 1688500 | 1731400 | 1774100 | 1815600 |
5 rows × 255 columns
# Transposing the Time Series Dataset to arrange it in sequential order so that it becomes easy to perform Time Series analysis
timeseriesdf= timeseriesdf.T
timeseriesdf.head()
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1996-04 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 117500.0 | 131900.0 | 138100.0 | 153000.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-05 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 113700.0 | 131900.0 | 136900.0 | 149800.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-06 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 111200.0 | 131400.0 | 135900.0 | 146900.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-07 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 111100.0 | 130800.0 | 134500.0 | 144800.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-08 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 110400.0 | 130200.0 | 132900.0 | 143000.0 | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 21 columns
timeseriesdf.shape
(255, 21)
# Converting the Index to a DataTime Format
timeseriesdf.index = pd.to_datetime(timeseriesdf.index)
timeseriesdf.head()
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1996-04-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 117500.0 | 131900.0 | 138100.0 | 153000.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-05-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 113700.0 | 131900.0 | 136900.0 | 149800.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-06-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 111200.0 | 131400.0 | 135900.0 | 146900.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-07-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 111100.0 | 130800.0 | 134500.0 | 144800.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-08-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 110400.0 | 130200.0 | 132900.0 | 143000.0 | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 21 columns
# Visualizing the Time Series for all the 21 Zipcodes
# Changing the Figure Size
from pylab import rcParams
rcParams['figure.figsize'] = 15, 7
# Plotting the Time Series Plot
timeseriesdf.plot()
# marking the Red Dotted line on 2010-01-01
plt.axvline(pd.to_datetime('2010-01-01'), color='r', linestyle='--', lw=2)
plt.legend()
plt.show()
From the plot, there are many irregular patterns and trends along with many missing values before the red vertical line. So the time series data after the vertical line was considered.
Those Irregular patterns can be attributed to the recession during the period of 2008 and 2009 vitnessing a sudden drop in the costs.
Also after the vertical line, the trend seems to be linearly increasing without any abrupt changes for all the Zipcodes
Sometimes, pyramid doesn't do Auto ARIMA and throws up an error. So in order for it to work efficiently, it is better to have these versions installed
# Checking Versions
import sys; print(sys.version)
import numpy; print(numpy.__version__)
import scipy; print(scipy.__version__)
import sklearn; print(sklearn.__version__)
import pandas; print(pandas.__version__)
import statsmodels; print(statsmodels.__version__)
import pyramid; print(pyramid.__version__)
import pip; print(pip.__version__)
3.6.3 |Anaconda, Inc.| (default, Oct 15 2017, 03:27:45) [MSC v.1900 64 bit (AMD64)] 1.14.2 1.0.0 0.19.1 0.20.3 0.9.0 0.9.0 19.0.3
C:\Users\shari\Anaconda3\lib\site-packages\pyramid\__init__.py:68: UserWarning: The 'pyramid' package will be migrating to a new namespace beginning in version 1.0.0: 'pmdarima'. This is due to a package name collision with the Pyramid web framework. For more information, see Issue #34: https://github.com/tgsmith61591/pyramid/issues/34 The package will subsequently be installable via the name 'pmdarima'; the only functional change to the user will be the import name. All imports from 'pyramid' will change to 'pmdarima'.
timeseriesdf.head()
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1996-04-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 117500.0 | 131900.0 | 138100.0 | 153000.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-05-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 113700.0 | 131900.0 | 136900.0 | 149800.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-06-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 111200.0 | 131400.0 | 135900.0 | 146900.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-07-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 111100.0 | 130800.0 | 134500.0 | 144800.0 | NaN | NaN | NaN | NaN | NaN | NaN |
1996-08-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 110400.0 | 130200.0 | 132900.0 | 143000.0 | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 21 columns
# Filtering out the Timeseries Data From 2010-01-01
timeseriesdf=timeseriesdf[timeseriesdf.index>= '2010-01-01']
# Check for Missing Values
mis_value_percent(timeseriesdf)
0 0.0 1 0.0 2 0.0 3 0.0 4 0.0 5 0.0 6 0.0 7 0.0 8 0.0 9 0.0 10 0.0 11 0.0 12 0.0 13 0.0 14 0.0 15 0.0 16 0.0 17 0.0 18 0.0 19 0.0 20 0.0 dtype: float64
timeseriesdf
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2010-01-01 | 1227800.0 | 1378600.0 | 2268700.0 | 1451800.0 | 1163900.0 | 1323000.0 | 1271900.0 | 813300.0 | 1126000.0 | 1238600.0 | ... | 266200.0 | 331100.0 | 297200.0 | 296200.0 | 742800.0 | 529300.0 | 619000.0 | 591300.0 | 378700.0 | 315400.0 |
2010-02-01 | 1202600.0 | 1372300.0 | 2294100.0 | 1454400.0 | 1176400.0 | 1338500.0 | 1252300.0 | 821500.0 | 1107300.0 | 1227700.0 | ... | 266100.0 | 330000.0 | 296500.0 | 296400.0 | 747500.0 | 544800.0 | 619300.0 | 589600.0 | 386600.0 | 320300.0 |
2010-03-01 | 1207100.0 | 1393100.0 | 2273600.0 | 1494500.0 | 1184500.0 | 1344600.0 | 1262300.0 | 831700.0 | 1116100.0 | 1230700.0 | ... | 264700.0 | 328100.0 | 296100.0 | 296900.0 | 750300.0 | 540900.0 | 612700.0 | 589500.0 | 391600.0 | 315800.0 |
2010-04-01 | 1220100.0 | 1430000.0 | 2261700.0 | 1542200.0 | 1192700.0 | 1348300.0 | 1279200.0 | 845100.0 | 1138500.0 | 1237300.0 | ... | 262500.0 | 327200.0 | 295200.0 | 296700.0 | 753800.0 | 538300.0 | 612800.0 | 589300.0 | 392900.0 | 314000.0 |
2010-05-01 | 1252400.0 | 1462100.0 | 2282800.0 | 1587300.0 | 1203100.0 | 1350800.0 | 1309000.0 | 854500.0 | 1175900.0 | 1250400.0 | ... | 260800.0 | 324500.0 | 293600.0 | 295100.0 | 758900.0 | 536800.0 | 614500.0 | 590900.0 | 392900.0 | 311800.0 |
2010-06-01 | 1285700.0 | 1476800.0 | 2288000.0 | 1616000.0 | 1203500.0 | 1365100.0 | 1335300.0 | 858900.0 | 1199700.0 | 1260500.0 | ... | 260700.0 | 321600.0 | 292700.0 | 293800.0 | 764600.0 | 539400.0 | 611300.0 | 595800.0 | 393800.0 | 312600.0 |
2010-07-01 | 1278300.0 | 1479000.0 | 2275700.0 | 1612300.0 | 1200400.0 | 1390900.0 | 1353800.0 | 859200.0 | 1198700.0 | 1260700.0 | ... | 260900.0 | 321000.0 | 293300.0 | 294500.0 | 767300.0 | 546600.0 | 605900.0 | 599000.0 | 392800.0 | 313100.0 |
2010-08-01 | 1279100.0 | 1476600.0 | 2281300.0 | 1605800.0 | 1202900.0 | 1402700.0 | 1366400.0 | 863500.0 | 1205000.0 | 1257300.0 | ... | 259900.0 | 320600.0 | 294400.0 | 295600.0 | 769400.0 | 553100.0 | 604400.0 | 600500.0 | 390800.0 | 307000.0 |
2010-09-01 | 1326700.0 | 1461200.0 | 2302800.0 | 1592500.0 | 1210100.0 | 1403700.0 | 1372100.0 | 876000.0 | 1237500.0 | 1249000.0 | ... | 259200.0 | 318900.0 | 294600.0 | 295300.0 | 772400.0 | 553900.0 | 610100.0 | 602700.0 | 390400.0 | 298200.0 |
2010-10-01 | 1376500.0 | 1448300.0 | 2290900.0 | 1584200.0 | 1210300.0 | 1404900.0 | 1381300.0 | 886100.0 | 1276200.0 | 1234500.0 | ... | 258000.0 | 316500.0 | 293100.0 | 292900.0 | 779100.0 | 559000.0 | 618000.0 | 607600.0 | 390000.0 | 293200.0 |
2010-11-01 | 1368900.0 | 1441600.0 | 2267800.0 | 1587600.0 | 1211700.0 | 1403500.0 | 1385000.0 | 890000.0 | 1304400.0 | 1231800.0 | ... | 256600.0 | 315700.0 | 291800.0 | 291900.0 | 785600.0 | 569100.0 | 624400.0 | 617100.0 | 389900.0 | 290400.0 |
2010-12-01 | 1366000.0 | 1444300.0 | 2253600.0 | 1594300.0 | 1219100.0 | 1390200.0 | 1388100.0 | 894200.0 | 1310800.0 | 1244100.0 | ... | 255100.0 | 316200.0 | 290800.0 | 292800.0 | 786700.0 | 573200.0 | 627300.0 | 632300.0 | 388400.0 | 286000.0 |
2011-01-01 | 1381300.0 | 1438600.0 | 2245300.0 | 1611700.0 | 1222500.0 | 1376600.0 | 1399100.0 | 901800.0 | 1310700.0 | 1255300.0 | ... | 254000.0 | 316400.0 | 289400.0 | 293900.0 | 782000.0 | 573100.0 | 628500.0 | 642900.0 | 387100.0 | 276800.0 |
2011-02-01 | 1380700.0 | 1434100.0 | 2264200.0 | 1631800.0 | 1230600.0 | 1372400.0 | 1399800.0 | 909500.0 | 1318200.0 | 1260500.0 | ... | 253700.0 | 315400.0 | 288400.0 | 293500.0 | 782700.0 | 579600.0 | 630900.0 | 642800.0 | 385300.0 | 263700.0 |
2011-03-01 | 1368500.0 | 1439300.0 | 2301000.0 | 1648800.0 | 1247400.0 | 1378100.0 | 1389300.0 | 913300.0 | 1321900.0 | 1257500.0 | ... | 253000.0 | 313000.0 | 288200.0 | 291700.0 | 785800.0 | 588800.0 | 635000.0 | 643000.0 | 380700.0 | 251400.0 |
2011-04-01 | 1372700.0 | 1437700.0 | 2328600.0 | 1670500.0 | 1258000.0 | 1384600.0 | 1384700.0 | 907400.0 | 1309200.0 | 1246800.0 | ... | 251400.0 | 308700.0 | 285700.0 | 289300.0 | 784600.0 | 592200.0 | 636700.0 | 646800.0 | 377000.0 | 245500.0 |
2011-05-01 | 1378000.0 | 1430300.0 | 2337000.0 | 1674300.0 | 1250800.0 | 1389600.0 | 1380900.0 | 900000.0 | 1298700.0 | 1231400.0 | ... | 250100.0 | 306000.0 | 281900.0 | 287400.0 | 785100.0 | 592700.0 | 635600.0 | 651700.0 | 377900.0 | 243900.0 |
2011-06-01 | 1361700.0 | 1426800.0 | 2341500.0 | 1657100.0 | 1251500.0 | 1390200.0 | 1367900.0 | 897700.0 | 1289000.0 | 1229400.0 | ... | 248900.0 | 305400.0 | 279900.0 | 285900.0 | 795300.0 | 600100.0 | 640300.0 | 656400.0 | 377200.0 | 242100.0 |
2011-07-01 | 1357800.0 | 1427800.0 | 2335200.0 | 1659200.0 | 1253100.0 | 1386600.0 | 1365400.0 | 896300.0 | 1258400.0 | 1247200.0 | ... | 247400.0 | 304900.0 | 280600.0 | 284700.0 | 799300.0 | 610400.0 | 648100.0 | 665800.0 | 373100.0 | 242000.0 |
2011-08-01 | 1364400.0 | 1424600.0 | 2316700.0 | 1661800.0 | 1252000.0 | 1393100.0 | 1375100.0 | 892300.0 | 1238900.0 | 1272200.0 | ... | 245700.0 | 304600.0 | 280700.0 | 283100.0 | 798000.0 | 616600.0 | 652400.0 | 671300.0 | 371400.0 | 245200.0 |
2011-09-01 | 1358000.0 | 1432800.0 | 2299900.0 | 1641600.0 | 1253800.0 | 1399900.0 | 1380400.0 | 890400.0 | 1247100.0 | 1287700.0 | ... | 242800.0 | 303000.0 | 279700.0 | 281600.0 | 801800.0 | 619800.0 | 654800.0 | 667600.0 | 370700.0 | 247100.0 |
2011-10-01 | 1329800.0 | 1456500.0 | 2318700.0 | 1637800.0 | 1257900.0 | 1381500.0 | 1377000.0 | 888600.0 | 1260600.0 | 1296700.0 | ... | 240400.0 | 300700.0 | 278800.0 | 281000.0 | 808000.0 | 619800.0 | 657100.0 | 656800.0 | 368500.0 | 245300.0 |
2011-11-01 | 1317800.0 | 1485100.0 | 2396400.0 | 1646600.0 | 1256600.0 | 1363900.0 | 1375100.0 | 891700.0 | 1255000.0 | 1302500.0 | ... | 238900.0 | 299700.0 | 277500.0 | 280100.0 | 808200.0 | 617100.0 | 655600.0 | 646100.0 | 366600.0 | 244300.0 |
2011-12-01 | 1333200.0 | 1500200.0 | 2479200.0 | 1652100.0 | 1253500.0 | 1362700.0 | 1379000.0 | 899500.0 | 1250200.0 | 1301800.0 | ... | 237600.0 | 299400.0 | 276400.0 | 278500.0 | 805600.0 | 611100.0 | 653600.0 | 638800.0 | 365500.0 | 244500.0 |
2012-01-01 | 1348500.0 | 1509600.0 | 2490300.0 | 1674800.0 | 1250500.0 | 1371500.0 | 1395200.0 | 904400.0 | 1249200.0 | 1298000.0 | ... | 235800.0 | 297300.0 | 275400.0 | 276700.0 | 805100.0 | 605800.0 | 650300.0 | 638300.0 | 362100.0 | 245000.0 |
2012-02-01 | 1349500.0 | 1518500.0 | 2466000.0 | 1714300.0 | 1247700.0 | 1372000.0 | 1414500.0 | 908200.0 | 1253400.0 | 1292300.0 | ... | 233900.0 | 295900.0 | 274100.0 | 275800.0 | 813000.0 | 606000.0 | 650300.0 | 648000.0 | 360400.0 | 247200.0 |
2012-03-01 | 1352200.0 | 1530800.0 | 2492400.0 | 1736800.0 | 1256800.0 | 1385500.0 | 1419000.0 | 914000.0 | 1275300.0 | 1288500.0 | ... | 232600.0 | 296700.0 | 273400.0 | 276600.0 | 825500.0 | 611200.0 | 652100.0 | 657000.0 | 361300.0 | 252100.0 |
2012-04-01 | 1354100.0 | 1538000.0 | 2542800.0 | 1736800.0 | 1273600.0 | 1402800.0 | 1403100.0 | 915100.0 | 1314200.0 | 1286600.0 | ... | 231600.0 | 297800.0 | 273600.0 | 277700.0 | 835700.0 | 615100.0 | 653000.0 | 661300.0 | 362500.0 | 254900.0 |
2012-05-01 | 1351900.0 | 1530500.0 | 2585500.0 | 1741200.0 | 1282200.0 | 1411600.0 | 1383200.0 | 912300.0 | 1329400.0 | 1288000.0 | ... | 230100.0 | 298600.0 | 273500.0 | 278300.0 | 839100.0 | 615000.0 | 654800.0 | 662900.0 | 363700.0 | 256000.0 |
2012-06-01 | 1364200.0 | 1524500.0 | 2618100.0 | 1757700.0 | 1278600.0 | 1397400.0 | 1376700.0 | 914000.0 | 1316900.0 | 1290900.0 | ... | 229200.0 | 299700.0 | 273800.0 | 279500.0 | 836700.0 | 616600.0 | 666300.0 | 669900.0 | 365900.0 | 257100.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2015-01-01 | 1716300.0 | 1979200.0 | 3067600.0 | 2082800.0 | 1470700.0 | 1712000.0 | 1717700.0 | 1189700.0 | 1583200.0 | 1706000.0 | ... | 247700.0 | 316000.0 | 284400.0 | 305400.0 | 1253600.0 | 877100.0 | 948800.0 | 955400.0 | 392500.0 | 283800.0 |
2015-02-01 | 1720500.0 | 1982900.0 | 3047400.0 | 2107300.0 | 1471900.0 | 1706400.0 | 1700100.0 | 1193700.0 | 1553800.0 | 1691400.0 | ... | 249300.0 | 314500.0 | 283800.0 | 304900.0 | 1264500.0 | 890000.0 | 958200.0 | 959400.0 | 396200.0 | 285900.0 |
2015-03-01 | 1721800.0 | 2001600.0 | 3054500.0 | 2142400.0 | 1458700.0 | 1696900.0 | 1680400.0 | 1199900.0 | 1518000.0 | 1688700.0 | ... | 251800.0 | 312900.0 | 282200.0 | 304000.0 | 1270500.0 | 907400.0 | 978600.0 | 960700.0 | 401200.0 | 287300.0 |
2015-04-01 | 1741800.0 | 2014700.0 | 3071700.0 | 2167100.0 | 1442700.0 | 1672900.0 | 1676400.0 | 1201400.0 | 1508800.0 | 1694400.0 | ... | 254600.0 | 313100.0 | 283700.0 | 304600.0 | 1276300.0 | 920600.0 | 1013600.0 | 969000.0 | 404900.0 | 287900.0 |
2015-05-01 | 1775800.0 | 2023500.0 | 3078300.0 | 2189400.0 | 1436400.0 | 1656300.0 | 1685600.0 | 1202600.0 | 1515900.0 | 1703200.0 | ... | 257700.0 | 315500.0 | 287800.0 | 306600.0 | 1289600.0 | 925700.0 | 1028300.0 | 987700.0 | 409000.0 | 288800.0 |
2015-06-01 | 1796500.0 | 2055300.0 | 3073100.0 | 2220100.0 | 1445800.0 | 1670900.0 | 1708100.0 | 1214200.0 | 1532800.0 | 1709600.0 | ... | 260900.0 | 318700.0 | 291800.0 | 308700.0 | 1303800.0 | 936300.0 | 1020200.0 | 1004400.0 | 414500.0 | 289900.0 |
2015-07-01 | 1821500.0 | 2078300.0 | 3089400.0 | 2250300.0 | 1471800.0 | 1696900.0 | 1730400.0 | 1235200.0 | 1548500.0 | 1709900.0 | ... | 262400.0 | 322000.0 | 294500.0 | 310200.0 | 1305300.0 | 954000.0 | 1024200.0 | 1022900.0 | 419000.0 | 294300.0 |
2015-08-01 | 1870100.0 | 2083600.0 | 3121700.0 | 2275700.0 | 1493700.0 | 1713900.0 | 1751800.0 | 1258000.0 | 1550500.0 | 1703200.0 | ... | 262700.0 | 324900.0 | 297100.0 | 311300.0 | 1298900.0 | 969700.0 | 1052400.0 | 1044200.0 | 423300.0 | 302500.0 |
2015-09-01 | 1901000.0 | 2088800.0 | 3134200.0 | 2301600.0 | 1515100.0 | 1718600.0 | 1778300.0 | 1287700.0 | 1550200.0 | 1697700.0 | ... | 263300.0 | 326600.0 | 299300.0 | 312400.0 | 1301000.0 | 973200.0 | 1090500.0 | 1064800.0 | 428700.0 | 309300.0 |
2015-10-01 | 1904900.0 | 2110600.0 | 3152400.0 | 2314700.0 | 1542600.0 | 1721000.0 | 1810400.0 | 1307200.0 | 1560600.0 | 1693000.0 | ... | 264900.0 | 328800.0 | 301500.0 | 313800.0 | 1314200.0 | 971300.0 | 1113800.0 | 1090400.0 | 432500.0 | 311600.0 |
2015-11-01 | 1914000.0 | 2127500.0 | 3173600.0 | 2322000.0 | 1556500.0 | 1717000.0 | 1831600.0 | 1313900.0 | 1575700.0 | 1685400.0 | ... | 266200.0 | 332100.0 | 303600.0 | 315100.0 | 1322800.0 | 974100.0 | 1123100.0 | 1125400.0 | 433400.0 | 313100.0 |
2015-12-01 | 1926400.0 | 2168900.0 | 3195000.0 | 2334500.0 | 1551500.0 | 1721700.0 | 1844400.0 | 1317100.0 | 1587100.0 | 1694300.0 | ... | 267000.0 | 336700.0 | 304400.0 | 315800.0 | 1320500.0 | 978100.0 | 1137000.0 | 1144400.0 | 435600.0 | 315500.0 |
2016-01-01 | 1932200.0 | 2204700.0 | 3234600.0 | 2360300.0 | 1558800.0 | 1746200.0 | 1861600.0 | 1327400.0 | 1610900.0 | 1725400.0 | ... | 267700.0 | 341300.0 | 303800.0 | 316100.0 | 1318800.0 | 975400.0 | 1161200.0 | 1138100.0 | 437800.0 | 318200.0 |
2016-02-01 | 1936700.0 | 2216100.0 | 3302400.0 | 2384700.0 | 1577200.0 | 1763200.0 | 1889600.0 | 1338800.0 | 1646100.0 | 1742000.0 | ... | 268600.0 | 344700.0 | 304200.0 | 316300.0 | 1325600.0 | 967200.0 | 1179700.0 | 1121800.0 | 436200.0 | 320800.0 |
2016-03-01 | 1945200.0 | 2212500.0 | 3335800.0 | 2388200.0 | 1584500.0 | 1743800.0 | 1901500.0 | 1350400.0 | 1659500.0 | 1734500.0 | ... | 269600.0 | 347800.0 | 306200.0 | 316300.0 | 1333000.0 | 963800.0 | 1192500.0 | 1103900.0 | 434600.0 | 325400.0 |
2016-04-01 | 1935600.0 | 2222600.0 | 3324200.0 | 2358300.0 | 1587300.0 | 1709600.0 | 1895300.0 | 1356600.0 | 1648100.0 | 1729700.0 | ... | 271300.0 | 351700.0 | 307900.0 | 317000.0 | 1334800.0 | 962500.0 | 1194700.0 | 1082200.0 | 437600.0 | 329400.0 |
2016-05-01 | 1911200.0 | 2231900.0 | 3285100.0 | 2345800.0 | 1594100.0 | 1703500.0 | 1890200.0 | 1358500.0 | 1654800.0 | 1741800.0 | ... | 273000.0 | 356200.0 | 309600.0 | 318500.0 | 1333100.0 | 966100.0 | 1198400.0 | 1067000.0 | 440100.0 | 331700.0 |
2016-06-01 | 1918700.0 | 2250800.0 | 3274100.0 | 2381700.0 | 1613000.0 | 1731000.0 | 1898400.0 | 1364000.0 | 1693800.0 | 1764700.0 | ... | 274600.0 | 359400.0 | 312700.0 | 320400.0 | 1334600.0 | 971500.0 | 1203700.0 | 1077200.0 | 437900.0 | 332700.0 |
2016-07-01 | 1947600.0 | 2285200.0 | 3298600.0 | 2439700.0 | 1636600.0 | 1762600.0 | 1924500.0 | 1373300.0 | 1745600.0 | 1777000.0 | ... | 277600.0 | 363100.0 | 315800.0 | 322500.0 | 1339000.0 | 975300.0 | 1206000.0 | 1093200.0 | 438700.0 | 334900.0 |
2016-08-01 | 1951300.0 | 2329100.0 | 3341100.0 | 2483000.0 | 1657500.0 | 1778100.0 | 1967300.0 | 1382600.0 | 1790400.0 | 1776200.0 | ... | 281400.0 | 367900.0 | 316400.0 | 324200.0 | 1343000.0 | 976200.0 | 1206100.0 | 1108300.0 | 443600.0 | 338000.0 |
2016-09-01 | 1932800.0 | 2354000.0 | 3347100.0 | 2480800.0 | 1668300.0 | 1773200.0 | 1993500.0 | 1374400.0 | 1814600.0 | 1766400.0 | ... | 284500.0 | 373500.0 | 316100.0 | 325400.0 | 1340200.0 | 982200.0 | 1207600.0 | 1128000.0 | 448400.0 | 342100.0 |
2016-10-01 | 1930400.0 | 2355500.0 | 3305500.0 | 2443200.0 | 1664800.0 | 1760600.0 | 1980700.0 | 1364100.0 | 1820400.0 | 1749800.0 | ... | 287100.0 | 378600.0 | 318300.0 | 326600.0 | 1338700.0 | 992000.0 | 1209800.0 | 1152400.0 | 452300.0 | 348300.0 |
2016-11-01 | 1937500.0 | 2352200.0 | 3261100.0 | 2430100.0 | 1659500.0 | 1762000.0 | 1960900.0 | 1366300.0 | 1810000.0 | 1744100.0 | ... | 290700.0 | 382700.0 | 321200.0 | 328400.0 | 1350600.0 | 1002600.0 | 1216200.0 | 1166000.0 | 457200.0 | 356100.0 |
2016-12-01 | 1935100.0 | 2332100.0 | 3244000.0 | 2452900.0 | 1656000.0 | 1776800.0 | 1951300.0 | 1354800.0 | 1786500.0 | 1737900.0 | ... | 294700.0 | 385300.0 | 322900.0 | 330300.0 | 1375600.0 | 1021400.0 | 1221400.0 | 1166500.0 | 461200.0 | 358400.0 |
2017-01-01 | 1915700.0 | 2313300.0 | 3231400.0 | 2451200.0 | 1643700.0 | 1775200.0 | 1937800.0 | 1327500.0 | 1787100.0 | 1722800.0 | ... | 299000.0 | 387900.0 | 325400.0 | 333600.0 | 1390200.0 | 1039300.0 | 1225600.0 | 1166600.0 | 464400.0 | 357500.0 |
2017-02-01 | 1916500.0 | 2319600.0 | 3183300.0 | 2441900.0 | 1649800.0 | 1774500.0 | 1929800.0 | 1317300.0 | 1801100.0 | 1736600.0 | ... | 303800.0 | 394500.0 | 331700.0 | 339300.0 | 1398100.0 | 1040800.0 | 1225000.0 | 1182900.0 | 466400.0 | 361900.0 |
2017-03-01 | 1965700.0 | 2342100.0 | 3160200.0 | 2460900.0 | 1688500.0 | 1812700.0 | 1955000.0 | 1333700.0 | 1847900.0 | 1750100.0 | ... | 308300.0 | 403800.0 | 339000.0 | 345000.0 | 1399900.0 | 1035800.0 | 1219600.0 | 1202200.0 | 470900.0 | 370800.0 |
2017-04-01 | 2045300.0 | 2365900.0 | 3193500.0 | 2494900.0 | 1731400.0 | 1914600.0 | 2022400.0 | 1352100.0 | 1922800.0 | 1735500.0 | ... | 313800.0 | 412000.0 | 343500.0 | 348400.0 | 1400500.0 | 1050000.0 | 1238100.0 | 1213200.0 | 475700.0 | 377100.0 |
2017-05-01 | 2109100.0 | 2419700.0 | 3262200.0 | 2498400.0 | 1774100.0 | 1997800.0 | 2095000.0 | 1390000.0 | 2015700.0 | 1717100.0 | ... | 321300.0 | 418900.0 | 347800.0 | 351700.0 | 1407300.0 | 1066200.0 | 1276400.0 | 1211700.0 | 477100.0 | 380200.0 |
2017-06-01 | 2147000.0 | 2480400.0 | 3316500.0 | 2491600.0 | 1815600.0 | 2031600.0 | 2142300.0 | 1431000.0 | 2083900.0 | 1712900.0 | ... | 328300.0 | 425100.0 | 352900.0 | 355000.0 | 1420700.0 | 1070800.0 | 1302300.0 | 1202900.0 | 476900.0 | 382300.0 |
90 rows × 21 columns
# Importing the Library for performing the Auto ARIMA:
from pyramid.arima import auto_arima
# Creating a Function to Perform Auto ARIMA
def auto_arima_func(dataset):
stepwise_model = auto_arima(dataset, start_p=1, start_q=1,
max_p=3, max_q=3, m=12,
seasonal=True,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
# Here it is essential to forecast the values upto 22 periods to obtain the Current Cost (For April 2019)
# If the Current Month changes if using this in the future, the n_periods value below will change accordingly counting from July 2017.
future_forecast = stepwise_model.predict(n_periods=22)
return future_forecast
final_forecast=[]
# Here it is required to forecast for 21 Zipcodes. If the number of Zipcodes change in the future, the range in for loop also changes
for i in range(0,21,1):
final_forecast.append(auto_arima_func(timeseriesdf[i]))
Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=2073.591, BIC=2088.523, Fit time=0.441 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2024.471, BIC=2029.448, Fit time=0.010 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2048.337, BIC=2058.291, Fit time=0.291 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=2301.472, BIC=2311.426, Fit time=0.265 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2026.429, BIC=2033.895, Fit time=0.091 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=2026.435, BIC=2033.901, Fit time=0.031 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=2028.368, BIC=2038.323, Fit time=0.053 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2046.404, BIC=2053.870, Fit time=0.067 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=2310.301, BIC=2317.767, Fit time=0.067 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=2072.606, BIC=2082.561, Fit time=0.113 seconds Total fit time: 1.440 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=2049.476, BIC=2064.408, Fit time=0.361 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1994.220, BIC=1999.197, Fit time=0.010 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2069.662, BIC=2079.616, Fit time=0.072 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=2013.937, BIC=2023.892, Fit time=0.145 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1994.735, BIC=2002.201, Fit time=0.216 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=1994.615, BIC=2002.081, Fit time=0.045 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=1996.250, BIC=2006.205, Fit time=0.087 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2068.013, BIC=2075.479, Fit time=0.034 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=2013.788, BIC=2021.254, Fit time=0.076 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=2054.437, BIC=2064.392, Fit time=0.125 seconds Total fit time: 1.174 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2072.449, BIC=2077.426, Fit time=0.009 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2089.155, BIC=2099.110, Fit time=0.058 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2073.418, BIC=2080.884, Fit time=0.038 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=2073.356, BIC=2080.822, Fit time=0.038 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=2074.935, BIC=2084.890, Fit time=0.156 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2087.809, BIC=2095.275, Fit time=0.045 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=nan, BIC=nan, Fit time=nan seconds Total fit time: 0.357 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=2153.670, BIC=2168.602, Fit time=0.430 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2072.641, BIC=2077.619, Fit time=0.009 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2113.746, BIC=2123.700, Fit time=0.059 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2074.542, BIC=2082.008, Fit time=0.092 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=2074.544, BIC=2082.010, Fit time=0.040 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=2076.540, BIC=2086.495, Fit time=0.209 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2111.802, BIC=2119.268, Fit time=0.078 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=2151.444, BIC=2161.399, Fit time=0.155 seconds Total fit time: 1.078 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=2029.024, BIC=2043.956, Fit time=0.426 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1955.196, BIC=1960.173, Fit time=0.012 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2009.863, BIC=2019.817, Fit time=0.074 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1956.970, BIC=1964.436, Fit time=0.028 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=1956.951, BIC=1964.417, Fit time=0.104 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=1958.811, BIC=1968.765, Fit time=0.229 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2007.865, BIC=2015.331, Fit time=0.035 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=1958.940, BIC=1968.894, Fit time=0.280 seconds Total fit time: 1.197 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=2099.666, BIC=2114.598, Fit time=0.506 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2023.957, BIC=2028.934, Fit time=0.011 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2049.482, BIC=2059.437, Fit time=0.062 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2024.348, BIC=2031.814, Fit time=0.224 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=2024.313, BIC=2031.779, Fit time=0.038 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=2026.232, BIC=2036.186, Fit time=0.202 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2049.004, BIC=2056.470, Fit time=0.045 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=2107.801, BIC=2117.756, Fit time=0.208 seconds Total fit time: 1.305 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=2124.294, BIC=2139.226, Fit time=0.475 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2000.033, BIC=2005.010, Fit time=0.010 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2076.713, BIC=2086.668, Fit time=0.074 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2002.008, BIC=2009.474, Fit time=0.034 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=2002.010, BIC=2009.476, Fit time=0.075 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=2003.999, BIC=2013.954, Fit time=0.184 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2075.098, BIC=2082.564, Fit time=0.037 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=2129.996, BIC=2139.951, Fit time=0.173 seconds Total fit time: 1.069 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=1958.550, BIC=1973.482, Fit time=0.474 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1914.629, BIC=1919.606, Fit time=0.008 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1967.119, BIC=1977.074, Fit time=0.068 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=1938.367, BIC=1948.322, Fit time=0.189 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1916.621, BIC=1924.086, Fit time=0.039 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=1916.621, BIC=1924.087, Fit time=0.081 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=1918.617, BIC=1928.571, Fit time=0.103 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1965.819, BIC=1973.285, Fit time=0.032 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=1936.399, BIC=1943.865, Fit time=0.074 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=1958.698, BIC=1968.652, Fit time=0.145 seconds Total fit time: 1.215 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=2138.392, BIC=2153.324, Fit time=0.458 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2036.481, BIC=2041.458, Fit time=0.013 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2098.283, BIC=2108.237, Fit time=0.077 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=2038.117, BIC=2045.583, Fit time=0.031 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=2038.137, BIC=2045.603, Fit time=0.038 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=2040.066, BIC=2050.020, Fit time=0.224 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=2096.309, BIC=2103.775, Fit time=0.031 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=2138.882, BIC=2148.837, Fit time=0.215 seconds Total fit time: 1.098 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=1986.413, BIC=2001.345, Fit time=0.436 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1941.134, BIC=1946.112, Fit time=0.009 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1970.275, BIC=1980.230, Fit time=0.065 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1942.996, BIC=1950.462, Fit time=0.040 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=1943.007, BIC=1950.473, Fit time=0.038 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=1944.801, BIC=1954.756, Fit time=0.179 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1944.876, BIC=1952.342, Fit time=0.113 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=1985.697, BIC=1995.652, Fit time=0.171 seconds Total fit time: 1.057 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(1, 0, 1, 12); AIC=1900.633, BIC=1915.497, Fit time=0.450 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1894.129, BIC=1899.084, Fit time=0.010 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1897.932, BIC=1907.841, Fit time=0.118 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 1, 12); AIC=1898.675, BIC=1908.584, Fit time=0.089 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1895.925, BIC=1903.357, Fit time=0.048 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 1, 12); AIC=1895.920, BIC=1903.352, Fit time=0.050 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 1, 12); AIC=1897.343, BIC=1907.252, Fit time=0.235 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1896.138, BIC=1903.570, Fit time=0.026 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1896.913, BIC=1904.345, Fit time=0.069 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1898.024, BIC=1907.933, Fit time=0.100 seconds Total fit time: 1.199 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(1, 0, 1, 12); AIC=1430.055, BIC=1444.919, Fit time=0.350 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1426.192, BIC=1431.147, Fit time=0.009 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1425.517, BIC=1435.426, Fit time=0.126 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 1, 12); AIC=1426.566, BIC=1436.475, Fit time=0.065 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1424.291, BIC=1431.723, Fit time=0.144 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(0, 0, 1, 12); AIC=1425.512, BIC=1435.421, Fit time=0.124 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(1, 0, 1, 12); AIC=1427.499, BIC=1439.886, Fit time=0.255 seconds Fit ARIMA: order=(2, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1425.368, BIC=1435.277, Fit time=0.188 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1427.004, BIC=1436.913, Fit time=0.107 seconds Fit ARIMA: order=(2, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1435.623, BIC=1448.009, Fit time=0.190 seconds Total fit time: 1.562 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(1, 0, 1, 12); AIC=1493.631, BIC=1508.495, Fit time=0.499 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1488.721, BIC=1493.676, Fit time=0.010 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1491.254, BIC=1501.163, Fit time=0.139 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 1, 12); AIC=1491.474, BIC=1501.383, Fit time=0.066 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1490.077, BIC=1497.509, Fit time=0.050 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 1, 12); AIC=1490.067, BIC=1497.499, Fit time=0.061 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 1, 12); AIC=1491.472, BIC=1501.381, Fit time=0.056 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1489.928, BIC=1497.360, Fit time=0.052 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1490.198, BIC=1497.630, Fit time=0.065 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1490.968, BIC=1500.878, Fit time=0.245 seconds Total fit time: 1.248 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(1, 0, 1, 12); AIC=1514.459, BIC=1529.323, Fit time=0.099 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1506.264, BIC=1511.219, Fit time=0.009 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1510.116, BIC=1520.025, Fit time=0.138 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 1, 12); AIC=1510.352, BIC=1520.262, Fit time=0.059 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1508.104, BIC=1515.536, Fit time=0.033 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 1, 12); AIC=1508.104, BIC=1515.536, Fit time=0.035 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 1, 12); AIC=1510.102, BIC=1520.011, Fit time=0.105 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1508.281, BIC=1515.713, Fit time=0.073 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1508.530, BIC=1515.962, Fit time=0.034 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1510.655, BIC=1520.565, Fit time=0.047 seconds Total fit time: 0.636 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(1, 0, 1, 12); AIC=1448.639, BIC=1463.503, Fit time=0.190 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1443.675, BIC=1448.630, Fit time=0.009 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1445.062, BIC=1454.972, Fit time=0.115 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 1, 12); AIC=1445.241, BIC=1455.151, Fit time=0.061 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1443.040, BIC=1450.472, Fit time=0.075 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 1, 12); AIC=1444.481, BIC=1454.391, Fit time=0.206 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(2, 0, 1, 12); AIC=1445.108, BIC=1457.494, Fit time=0.719 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(1, 0, 0, 12); AIC=1445.299, BIC=1455.208, Fit time=0.137 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(1, 0, 0, 12); AIC=1447.307, BIC=1459.694, Fit time=0.086 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(2, 0, 0, 12); AIC=1443.580, BIC=1453.490, Fit time=0.253 seconds Total fit time: 1.857 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=1934.729, BIC=1949.661, Fit time=0.385 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1861.785, BIC=1866.762, Fit time=0.009 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1986.297, BIC=1996.251, Fit time=0.087 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1862.165, BIC=1869.631, Fit time=0.180 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=1861.954, BIC=1869.419, Fit time=0.047 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=1862.995, BIC=1872.950, Fit time=0.227 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1984.309, BIC=1991.775, Fit time=0.033 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=1865.780, BIC=1875.734, Fit time=0.322 seconds Total fit time: 1.297 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=1860.451, BIC=1875.382, Fit time=0.423 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1824.686, BIC=1829.664, Fit time=0.014 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1833.945, BIC=1843.900, Fit time=0.333 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=1849.111, BIC=1859.065, Fit time=0.193 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1825.803, BIC=1833.269, Fit time=0.081 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=1825.650, BIC=1833.116, Fit time=0.055 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=1827.039, BIC=1836.994, Fit time=0.130 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1874.773, BIC=1882.239, Fit time=0.058 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=1848.255, BIC=1855.721, Fit time=0.079 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=1881.852, BIC=1891.807, Fit time=0.129 seconds Total fit time: 1.497 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(1, 0, 1, 12); AIC=1835.562, BIC=1850.426, Fit time=0.457 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1829.614, BIC=1834.569, Fit time=0.010 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1832.050, BIC=1841.959, Fit time=0.180 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 1, 12); AIC=1832.283, BIC=1842.192, Fit time=0.134 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1831.401, BIC=1838.833, Fit time=0.053 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 1, 12); AIC=1831.400, BIC=1838.832, Fit time=0.062 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 1, 12); AIC=1833.349, BIC=1843.258, Fit time=0.088 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1830.273, BIC=1837.705, Fit time=0.046 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1830.517, BIC=1837.949, Fit time=0.062 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1831.983, BIC=1841.892, Fit time=0.199 seconds Total fit time: 1.294 seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 0, 1, 12); AIC=1927.213, BIC=1942.145, Fit time=0.671 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1910.265, BIC=1915.242, Fit time=0.009 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1993.788, BIC=2003.743, Fit time=0.072 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 0, 12); AIC=1912.182, BIC=1919.648, Fit time=0.040 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 1, 12); AIC=1912.203, BIC=1919.669, Fit time=0.036 seconds Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 0, 1, 12); AIC=1913.990, BIC=1923.944, Fit time=0.107 seconds Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 12); AIC=1920.071, BIC=1927.537, Fit time=0.105 seconds Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=nan, BIC=nan, Fit time=nan seconds Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 12); AIC=1995.354, BIC=2005.308, Fit time=0.107 seconds Total fit time: 1.158 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(1, 0, 1, 12); AIC=1584.520, BIC=1599.384, Fit time=0.674 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1581.597, BIC=1586.552, Fit time=0.017 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1583.223, BIC=1593.132, Fit time=0.085 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 1, 12); AIC=1583.506, BIC=1593.415, Fit time=0.153 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1583.551, BIC=1590.983, Fit time=0.191 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 1, 12); AIC=1583.550, BIC=1590.982, Fit time=0.053 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(1, 0, 1, 12); AIC=1584.344, BIC=1594.253, Fit time=0.321 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1581.266, BIC=1588.698, Fit time=0.024 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1581.886, BIC=1591.796, Fit time=0.177 seconds Fit ARIMA: order=(2, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1609.935, BIC=1622.322, Fit time=0.309 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(0, 0, 1, 12); AIC=1583.222, BIC=1593.131, Fit time=0.064 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(1, 0, 1, 12); AIC=1584.088, BIC=1596.474, Fit time=0.273 seconds Fit ARIMA: order=(2, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1592.641, BIC=1602.551, Fit time=0.225 seconds Total fit time: 2.574 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(1, 0, 1, 12); AIC=1623.058, BIC=1637.922, Fit time=0.270 seconds Fit ARIMA: order=(0, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1633.266, BIC=1638.220, Fit time=0.012 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(1, 0, 0, 12); AIC=1620.803, BIC=1630.713, Fit time=0.054 seconds Fit ARIMA: order=(0, 2, 1) seasonal_order=(0, 0, 1, 12); AIC=1621.313, BIC=1631.222, Fit time=0.134 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1618.816, BIC=1626.248, Fit time=0.042 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(0, 0, 1, 12); AIC=1620.803, BIC=1630.713, Fit time=0.062 seconds Fit ARIMA: order=(1, 2, 0) seasonal_order=(1, 0, 1, 12); AIC=1622.816, BIC=1635.202, Fit time=0.080 seconds Fit ARIMA: order=(2, 2, 0) seasonal_order=(0, 0, 0, 12); AIC=1625.331, BIC=1635.241, Fit time=0.038 seconds Fit ARIMA: order=(1, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1619.049, BIC=1628.958, Fit time=0.161 seconds Fit ARIMA: order=(2, 2, 1) seasonal_order=(0, 0, 0, 12); AIC=1639.944, BIC=1652.330, Fit time=0.133 seconds Total fit time: 0.987 seconds
def join_forecast_values(original_dataset, forecast_dataset):
# Converting the Forecast values to a Dataframe
forecast = pd.DataFrame(forecast_dataset)
# Changing the Column Names to Date
forecast.columns= pd.date_range(start= '2017-07-01', end='2019-04-01', freq='MS')
#Concatenating the Forecasted Dataframe to the Original Dataframe
final_df=pd.concat([original_dataset,forecast], axis=1)
# Converting the final month column to String and then Renaming
final_df.columns = final_df.columns.astype(str)
final_df.rename( columns={"2019-04-01 00:00:00": "Current Month Cost"}, inplace=True)
return final_df
# Joining the Forecasted Data to the Actual Group Joined Data created before
final_df=join_forecast_values(df, final_forecast)
final_df
zipcode | id | scrape_id | host_id | host_acceptance_rate | host_listings_count | host_total_listings_count | latitude | longitude | accommodates | ... | 2018-07-01 00:00:00 | 2018-08-01 00:00:00 | 2018-09-01 00:00:00 | 2018-10-01 00:00:00 | 2018-11-01 00:00:00 | 2018-12-01 00:00:00 | 2019-01-01 00:00:00 | 2019-02-01 00:00:00 | 2019-03-01 00:00:00 | Current Month Cost | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10003 | 9.124661e+06 | 2.017050e+13 | 2.496037e+07 | NaN | 1.646018 | 1.646018 | 40.728902 | -73.988258 | 4.566372 | ... | 2.281265e+06 | 2.291593e+06 | 2.301921e+06 | 2.312249e+06 | 2.322578e+06 | 2.332906e+06 | 2.343234e+06 | 2.353562e+06 | 2.363890e+06 | 2.374218e+06 |
1 | 10011 | 9.889107e+06 | 2.017050e+13 | 3.005599e+07 | NaN | 1.448276 | 1.448276 | 40.742371 | -73.999752 | 4.517241 | ... | 2.641337e+06 | 2.653717e+06 | 2.666097e+06 | 2.678476e+06 | 2.690856e+06 | 2.703236e+06 | 2.715616e+06 | 2.727996e+06 | 2.740375e+06 | 2.752755e+06 |
2 | 10013 | 1.054726e+07 | 2.017050e+13 | 2.783653e+07 | NaN | 1.412698 | 1.412698 | 40.719555 | -74.004075 | 4.281250 | ... | 3.469549e+06 | 3.481322e+06 | 3.493096e+06 | 3.504869e+06 | 3.516642e+06 | 3.528415e+06 | 3.540188e+06 | 3.551961e+06 | 3.563734e+06 | 3.575507e+06 |
3 | 10014 | 8.654217e+06 | 2.017050e+13 | 2.054620e+07 | NaN | 1.537634 | 1.537634 | 40.734209 | -74.004240 | 4.344086 | ... | 2.643481e+06 | 2.655164e+06 | 2.666847e+06 | 2.678530e+06 | 2.690213e+06 | 2.701897e+06 | 2.713580e+06 | 2.725263e+06 | 2.736946e+06 | 2.748629e+06 |
4 | 10021 | 1.022891e+07 | 2.017050e+13 | 3.568867e+07 | NaN | 1.526316 | 1.526316 | 40.768441 | -73.956628 | 4.263158 | ... | 1.910792e+06 | 1.918115e+06 | 1.925437e+06 | 1.932760e+06 | 1.940082e+06 | 1.947404e+06 | 1.954727e+06 | 1.962049e+06 | 1.969372e+06 | 1.976694e+06 |
5 | 10022 | 9.963782e+06 | 2.017050e+13 | 4.129974e+07 | NaN | 1.394737 | 1.394737 | 40.758886 | -73.966700 | 5.447368 | ... | 2.135103e+06 | 2.143065e+06 | 2.151027e+06 | 2.158989e+06 | 2.166951e+06 | 2.174912e+06 | 2.182874e+06 | 2.190836e+06 | 2.198798e+06 | 2.206760e+06 |
6 | 10023 | 8.424752e+06 | 2.017050e+13 | 3.053048e+07 | NaN | 2.888889 | 2.888889 | 40.777581 | -73.981486 | 4.666667 | ... | 2.269437e+06 | 2.279217e+06 | 2.288997e+06 | 2.298776e+06 | 2.308556e+06 | 2.318336e+06 | 2.328116e+06 | 2.337896e+06 | 2.347675e+06 | 2.357455e+06 |
7 | 10025 | 7.849437e+06 | 2.017050e+13 | 2.567103e+07 | NaN | 1.571429 | 1.571429 | 40.798025 | -73.966355 | 4.821429 | ... | 1.521226e+06 | 1.528166e+06 | 1.535107e+06 | 1.542047e+06 | 1.548988e+06 | 1.555928e+06 | 1.562869e+06 | 1.569809e+06 | 1.576749e+06 | 1.583690e+06 |
8 | 10028 | 1.038785e+07 | 2.017050e+13 | 3.790560e+07 | NaN | 1.375000 | 1.375000 | 40.775367 | -73.950934 | 4.281250 | ... | 2.223818e+06 | 2.234581e+06 | 2.245344e+06 | 2.256107e+06 | 2.266870e+06 | 2.277633e+06 | 2.288396e+06 | 2.299158e+06 | 2.309921e+06 | 2.320684e+06 |
9 | 10036 | 8.401201e+06 | 2.017050e+13 | 3.853206e+07 | NaN | 73.500000 | 73.500000 | 40.754810 | -73.987824 | 4.750000 | ... | 1.782180e+06 | 1.787509e+06 | 1.792838e+06 | 1.798167e+06 | 1.803497e+06 | 1.808826e+06 | 1.814155e+06 | 1.819484e+06 | 1.824813e+06 | 1.830143e+06 |
10 | 10128 | 1.079017e+07 | 2.017050e+13 | 2.984589e+07 | NaN | 1.321429 | 1.321429 | 40.781842 | -73.949537 | 4.357143 | ... | 2.723528e+06 | 2.800994e+06 | 2.879236e+06 | 2.958255e+06 | 3.038049e+06 | 3.118619e+06 | 3.199966e+06 | 3.282089e+06 | 3.364987e+06 | 3.448662e+06 |
11 | 10304 | 1.590640e+06 | 2.017050e+13 | 7.534770e+06 | NaN | 1.000000 | 1.000000 | 40.626721 | -74.076721 | 3.500000 | ... | 4.259331e+05 | 4.339654e+05 | 4.420723e+05 | 4.502538e+05 | 4.585099e+05 | 4.668406e+05 | 4.752459e+05 | 4.837257e+05 | 4.922802e+05 | 5.009092e+05 |
12 | 10305 | 1.368595e+07 | 2.017050e+13 | 5.535834e+07 | NaN | 1.555556 | 1.555556 | 40.596360 | -74.077371 | 4.555556 | ... | 5.132489e+05 | 5.206102e+05 | 5.280545e+05 | 5.355818e+05 | 5.431920e+05 | 5.508852e+05 | 5.586614e+05 | 5.665205e+05 | 5.744625e+05 | 5.824875e+05 |
13 | 10306 | 8.328929e+06 | 2.017050e+13 | 5.237069e+07 | NaN | 2.000000 | 2.000000 | 40.568879 | -74.111565 | 5.500000 | ... | 4.251977e+05 | 4.312205e+05 | 4.373091e+05 | 4.434636e+05 | 4.496841e+05 | 4.559705e+05 | 4.623227e+05 | 4.687409e+05 | 4.752250e+05 | 4.817750e+05 |
14 | 10312 | 1.800434e+07 | 2.017050e+13 | 1.197988e+08 | NaN | 1.000000 | 1.000000 | 40.546331 | -74.183701 | 2.000000 | ... | 4.012734e+05 | 4.050852e+05 | 4.089326e+05 | 4.128157e+05 | 4.167345e+05 | 4.206890e+05 | 4.246794e+05 | 4.287059e+05 | 4.327680e+05 | 4.368654e+05 |
15 | 11201 | 9.298929e+06 | 2.017050e+13 | 2.327364e+07 | NaN | 1.451613 | 1.451613 | 40.692611 | -73.990576 | 4.451613 | ... | 1.519719e+06 | 1.527336e+06 | 1.534953e+06 | 1.542570e+06 | 1.550187e+06 | 1.557803e+06 | 1.565420e+06 | 1.573037e+06 | 1.580654e+06 | 1.588271e+06 |
16 | 11215 | 9.035422e+06 | 2.017050e+13 | 2.460656e+07 | NaN | 1.422764 | 1.422764 | 40.666241 | -73.984527 | 4.266129 | ... | 1.149896e+06 | 1.155980e+06 | 1.162064e+06 | 1.168148e+06 | 1.174233e+06 | 1.180317e+06 | 1.186401e+06 | 1.192485e+06 | 1.198570e+06 | 1.204654e+06 |
17 | 11217 | 6.866520e+06 | 2.017050e+13 | 1.988437e+07 | NaN | 1.283333 | 1.283333 | 40.681377 | -73.979068 | 3.983333 | ... | 1.665473e+06 | 1.695445e+06 | 1.725709e+06 | 1.756264e+06 | 1.787109e+06 | 1.818245e+06 | 1.849673e+06 | 1.881391e+06 | 1.913400e+06 | 1.945700e+06 |
18 | 11231 | 7.697324e+06 | 2.017050e+13 | 2.171201e+07 | NaN | 1.550725 | 1.550725 | 40.680632 | -74.000049 | 4.391304 | ... | 1.292235e+06 | 1.299107e+06 | 1.305979e+06 | 1.312851e+06 | 1.319722e+06 | 1.326594e+06 | 1.333466e+06 | 1.340338e+06 | 1.347210e+06 | 1.354082e+06 |
19 | 11234 | 1.066553e+07 | 2.017050e+13 | 2.752097e+07 | NaN | 1.333333 | 1.333333 | 40.623074 | -73.925943 | 4.166667 | ... | 4.688470e+05 | 4.678265e+05 | 4.667486e+05 | 4.656135e+05 | 4.644211e+05 | 4.631714e+05 | 4.618644e+05 | 4.605001e+05 | 4.590786e+05 | 4.575997e+05 |
20 | 11434 | 9.374263e+06 | 2.017050e+13 | 4.863844e+07 | NaN | 1.000000 | 1.000000 | 40.662033 | -73.769800 | 4.000000 | ... | 4.165402e+05 | 4.197565e+05 | 4.230561e+05 | 4.264390e+05 | 4.299051e+05 | 4.334544e+05 | 4.370870e+05 | 4.408028e+05 | 4.446019e+05 | 4.484842e+05 |
21 rows × 317 columns
final_df.shape
(21, 317)
# Function to Extract the Neighbours
def extract_neighbours(from_data, to_data):
# Since a particular zipcode has only a particular "neighbourhood_group_cleansed", I am using the .first() method to
# extract the neighbourhood and convert them to a Dataframe
neighbours=from_data.groupby('zipcode').first()['neighbourhood_group_cleansed'].to_frame().reset_index()
# Merging the Extracted data to the to_data
to_data=to_data.merge(neighbours, on='zipcode')
return(to_data)
# Storing it as "final_df"
final_df= extract_neighbours(airbnb, final_df)
final_df.head()
zipcode | id | scrape_id | host_id | host_acceptance_rate | host_listings_count | host_total_listings_count | latitude | longitude | accommodates | ... | 2018-08-01 00:00:00 | 2018-09-01 00:00:00 | 2018-10-01 00:00:00 | 2018-11-01 00:00:00 | 2018-12-01 00:00:00 | 2019-01-01 00:00:00 | 2019-02-01 00:00:00 | 2019-03-01 00:00:00 | Current Month Cost | neighbourhood_group_cleansed | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10003 | 9.124661e+06 | 2.017050e+13 | 2.496037e+07 | NaN | 1.646018 | 1.646018 | 40.728902 | -73.988258 | 4.566372 | ... | 2.291593e+06 | 2.301921e+06 | 2.312249e+06 | 2.322578e+06 | 2.332906e+06 | 2.343234e+06 | 2.353562e+06 | 2.363890e+06 | 2.374218e+06 | Manhattan |
1 | 10011 | 9.889107e+06 | 2.017050e+13 | 3.005599e+07 | NaN | 1.448276 | 1.448276 | 40.742371 | -73.999752 | 4.517241 | ... | 2.653717e+06 | 2.666097e+06 | 2.678476e+06 | 2.690856e+06 | 2.703236e+06 | 2.715616e+06 | 2.727996e+06 | 2.740375e+06 | 2.752755e+06 | Manhattan |
2 | 10013 | 1.054726e+07 | 2.017050e+13 | 2.783653e+07 | NaN | 1.412698 | 1.412698 | 40.719555 | -74.004075 | 4.281250 | ... | 3.481322e+06 | 3.493096e+06 | 3.504869e+06 | 3.516642e+06 | 3.528415e+06 | 3.540188e+06 | 3.551961e+06 | 3.563734e+06 | 3.575507e+06 | Manhattan |
3 | 10014 | 8.654217e+06 | 2.017050e+13 | 2.054620e+07 | NaN | 1.537634 | 1.537634 | 40.734209 | -74.004240 | 4.344086 | ... | 2.655164e+06 | 2.666847e+06 | 2.678530e+06 | 2.690213e+06 | 2.701897e+06 | 2.713580e+06 | 2.725263e+06 | 2.736946e+06 | 2.748629e+06 | Manhattan |
4 | 10021 | 1.022891e+07 | 2.017050e+13 | 3.568867e+07 | NaN | 1.526316 | 1.526316 | 40.768441 | -73.956628 | 4.263158 | ... | 1.918115e+06 | 1.925437e+06 | 1.932760e+06 | 1.940082e+06 | 1.947404e+06 | 1.954727e+06 | 1.962049e+06 | 1.969372e+06 | 1.976694e+06 | Manhattan |
5 rows × 318 columns
mis_value_percent(final_df)
zipcode 0.000000 id 0.000000 scrape_id 0.000000 host_id 0.000000 host_acceptance_rate 100.000000 host_listings_count 0.000000 host_total_listings_count 0.000000 latitude 0.000000 longitude 0.000000 accommodates 0.000000 bathrooms 0.000000 bedrooms 0.000000 beds 0.000000 square_feet 47.619048 price 0.000000 guests_included 0.000000 minimum_nights 0.000000 maximum_nights 0.000000 has_availability 100.000000 availability_30 0.000000 availability_60 0.000000 availability_90 0.000000 availability_365 0.000000 number_of_reviews 0.000000 review_scores_rating 0.000000 review_scores_accuracy 0.000000 review_scores_cleanliness 4.761905 review_scores_checkin 4.761905 review_scores_communication 4.761905 review_scores_location 4.761905 ... 2016-12 0.000000 2017-01 0.000000 2017-02 0.000000 2017-03 0.000000 2017-04 0.000000 2017-05 0.000000 2017-06 0.000000 2017-07-01 00:00:00 0.000000 2017-08-01 00:00:00 0.000000 2017-09-01 00:00:00 0.000000 2017-10-01 00:00:00 0.000000 2017-11-01 00:00:00 0.000000 2017-12-01 00:00:00 0.000000 2018-01-01 00:00:00 0.000000 2018-02-01 00:00:00 0.000000 2018-03-01 00:00:00 0.000000 2018-04-01 00:00:00 0.000000 2018-05-01 00:00:00 0.000000 2018-06-01 00:00:00 0.000000 2018-07-01 00:00:00 0.000000 2018-08-01 00:00:00 0.000000 2018-09-01 00:00:00 0.000000 2018-10-01 00:00:00 0.000000 2018-11-01 00:00:00 0.000000 2018-12-01 00:00:00 0.000000 2019-01-01 00:00:00 0.000000 2019-02-01 00:00:00 0.000000 2019-03-01 00:00:00 0.000000 Current Month Cost 0.000000 neighbourhood_group_cleansed 0.000000 Length: 318, dtype: float64
final_df=final_df.dropna(axis=1)
mis_value_percent(final_df)
zipcode 0.0 id 0.0 scrape_id 0.0 host_id 0.0 host_listings_count 0.0 host_total_listings_count 0.0 latitude 0.0 longitude 0.0 accommodates 0.0 bathrooms 0.0 bedrooms 0.0 beds 0.0 price 0.0 guests_included 0.0 minimum_nights 0.0 maximum_nights 0.0 availability_30 0.0 availability_60 0.0 availability_90 0.0 availability_365 0.0 number_of_reviews 0.0 review_scores_rating 0.0 review_scores_accuracy 0.0 calculated_host_listings_count 0.0 RegionID 0.0 City 0.0 State 0.0 Metro 0.0 CountyName 0.0 SizeRank 0.0 ... 2016-12 0.0 2017-01 0.0 2017-02 0.0 2017-03 0.0 2017-04 0.0 2017-05 0.0 2017-06 0.0 2017-07-01 00:00:00 0.0 2017-08-01 00:00:00 0.0 2017-09-01 00:00:00 0.0 2017-10-01 00:00:00 0.0 2017-11-01 00:00:00 0.0 2017-12-01 00:00:00 0.0 2018-01-01 00:00:00 0.0 2018-02-01 00:00:00 0.0 2018-03-01 00:00:00 0.0 2018-04-01 00:00:00 0.0 2018-05-01 00:00:00 0.0 2018-06-01 00:00:00 0.0 2018-07-01 00:00:00 0.0 2018-08-01 00:00:00 0.0 2018-09-01 00:00:00 0.0 2018-10-01 00:00:00 0.0 2018-11-01 00:00:00 0.0 2018-12-01 00:00:00 0.0 2019-01-01 00:00:00 0.0 2019-02-01 00:00:00 0.0 2019-03-01 00:00:00 0.0 Current Month Cost 0.0 neighbourhood_group_cleansed 0.0 Length: 174, dtype: float64
# Renaming the Price column to Revenue to make it clear
final_df.rename(columns={'price':'Revenue'}, inplace=True)
# Check for duplicate values
final_df.duplicated()
0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 False 8 False 9 False 10 False 11 False 12 False 13 False 14 False 15 False 16 False 17 False 18 False 19 False 20 False dtype: bool
# First an account in Mapbox was created and the Public access token from that was used.
mapbox_access_token = 'pk.eyJ1IjoiaGFyaXNoMjMiLCJhIjoiY2p1N2h5bzliMXVrbDQwb2J3NmF1N3JoMSJ9.7BrKDWLpEdewUnq47GWOKA'
# Inside the Scattermapbox, I have specified the longitude and the latitude column for the map
data = [
go.Scattermapbox(
lat=final_df.latitude,
lon=final_df.longitude,
mode='markers',
marker=go.scattermapbox.Marker(
size=9
),text= final_df.zipcode.map(str) + " " + final_df.neighbourhood_group_cleansed # To Display both the Zipcode and Neighbourhood
,
)
]
# And then the access token defined above was assigned and then the center point of the map is specified
layout = go.Layout(
autosize=True,
hovermode='closest',
mapbox=go.layout.Mapbox(
accesstoken=mapbox_access_token,
bearing=0,
center=go.layout.mapbox.Center(
lat=40.666,
lon=-73.98
),
pitch=0,
zoom=9
),
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)
final_df.neighbourhood_group_cleansed.value_counts()
Manhattan 11 Brooklyn 5 Staten Island 4 Queens 1 Name: neighbourhood_group_cleansed, dtype: int64
final_df[final_df['neighbourhood_group_cleansed']=='Manhattan']['review_scores_rating']
0 92.514445 1 93.691141 2 91.690927 3 93.658104 4 93.153286 5 88.233652 6 94.411979 7 92.062500 8 94.200047 9 90.763228 10 91.747068 Name: review_scores_rating, dtype: float64
#Function to Create a Scatterplot for Different Neighbours
def scatterplot(xaxis, yaxis, title):
data=[]
neighbours=['Manhattan', 'Brooklyn', 'Staten Island', 'Queens']
# Now looping over the Neighbours List
for i in neighbours:
trace = go.Scatter(x= final_df[final_df['neighbourhood_group_cleansed']==i][xaxis],
y = final_df[final_df['neighbourhood_group_cleansed']==i][yaxis],
mode= 'markers',
name= i,
marker = dict(symbol='circle', size=16),
text= final_df[final_df['neighbourhood_group_cleansed']==i]['zipcode'])
data.append(trace)
# The above loop will create 4 traces for 4 different neighbourhoods and append it to the data list
layout= go.Layout(title=title,
hovermode= 'closest',
xaxis= dict(
title= xaxis),
yaxis=dict(
title= yaxis
))
fig =go.Figure(data=data, layout=layout)
iplot(fig)
scatterplot('review_scores_accuracy','review_scores_rating', 'Review Analysis')
final_df[['zipcode','review_scores_rating', 'number_of_reviews']]
zipcode | review_scores_rating | number_of_reviews | |
---|---|---|---|
0 | 10003 | 92.514445 | 31.292486 |
1 | 10011 | 93.691141 | 15.945257 |
2 | 10013 | 91.690927 | 18.906767 |
3 | 10014 | 93.658104 | 17.774297 |
4 | 10021 | 93.153286 | 18.810971 |
5 | 10022 | 88.233652 | 29.905702 |
6 | 10023 | 94.411979 | 16.877034 |
7 | 10025 | 92.062500 | 20.180494 |
8 | 10028 | 94.200047 | 11.622623 |
9 | 10036 | 90.763228 | 22.604651 |
10 | 10128 | 91.747068 | 9.917482 |
11 | 10304 | 80.000000 | 1.000000 |
12 | 10305 | 93.333333 | 8.261438 |
13 | 10306 | 95.500000 | 71.500000 |
14 | 10312 | 97.857143 | 23.222222 |
15 | 11201 | 95.517568 | 22.670683 |
16 | 11215 | 95.111842 | 16.446481 |
17 | 11217 | 94.709933 | 20.022059 |
18 | 11231 | 94.353986 | 19.016645 |
19 | 11234 | 88.000000 | 13.833333 |
20 | 11434 | 95.000000 | 42.000000 |
# This function is used to create a plot with xaxis, yaxis and a variable whose size will change in the plot depending on the values
def size_scatter(xaxis, yaxis, size):
data=[]
neighbours=['Manhattan', 'Brooklyn', 'Staten Island', 'Queens']
for i in neighbours:
trace = go.Scatter(x= final_df[final_df['neighbourhood_group_cleansed']==i][xaxis],
y = final_df[final_df['neighbourhood_group_cleansed']==i][yaxis],
mode='markers',
marker=dict(
size=final_df[final_df['neighbourhood_group_cleansed']==i][size]
),
text = "%s"%(size)+'='+ final_df[final_df['neighbourhood_group_cleansed']==i][size].map(str) + '</br> ' + "Zipcode ="+ final_df[final_df['neighbourhood_group_cleansed']==i]['zipcode'].map(str)
)
data.append(trace)
layout= go.Layout(
title= 'Scatter Plot with Size = %s' %(size),
hovermode= 'closest',
xaxis= dict(type='category',
ticklen= 5,
zeroline= False,
gridwidth= 2,
),
yaxis=dict(
title= '%s'%(yaxis),
ticklen= 5,
gridwidth= 2,
),
showlegend= False
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)
size_scatter('neighbourhood_group_cleansed', 'review_scores_rating', 'number_of_reviews' )
cols(final_df)
['zipcode', 'id', 'scrape_id', 'host_id', 'host_listings_count', 'host_total_listings_count', 'latitude', 'longitude', 'accommodates', 'bathrooms', 'bedrooms', 'beds', 'Revenue', 'guests_included', 'minimum_nights', 'maximum_nights', 'availability_30', 'availability_60', 'availability_90', 'availability_365', 'number_of_reviews', 'review_scores_rating', 'review_scores_accuracy', 'calculated_host_listings_count', 'RegionID', 'City', 'State', 'Metro', 'CountyName', 'SizeRank', '2007-06', '2007-07', '2007-08', '2007-09', '2007-10', '2007-11', '2007-12', '2008-01', '2008-02', '2008-03', '2008-04', '2008-05', '2008-06', '2008-07', '2008-08', '2008-09', '2008-10', '2008-11', '2008-12', '2009-01', '2009-02', '2009-03', '2009-04', '2009-05', '2009-06', '2009-07', '2009-08', '2009-09', '2009-10', '2009-11', '2009-12', '2010-01', '2010-02', '2010-03', '2010-04', '2010-05', '2010-06', '2010-07', '2010-08', '2010-09', '2010-10', '2010-11', '2010-12', '2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06', '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12', '2012-01', '2012-02', '2012-03', '2012-04', '2012-05', '2012-06', '2012-07', '2012-08', '2012-09', '2012-10', '2012-11', '2012-12', '2013-01', '2013-02', '2013-03', '2013-04', '2013-05', '2013-06', '2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12', '2014-01', '2014-02', '2014-03', '2014-04', '2014-05', '2014-06', '2014-07', '2014-08', '2014-09', '2014-10', '2014-11', '2014-12', '2015-01', '2015-02', '2015-03', '2015-04', '2015-05', '2015-06', '2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12', '2016-01', '2016-02', '2016-03', '2016-04', '2016-05', '2016-06', '2016-07', '2016-08', '2016-09', '2016-10', '2016-11', '2016-12', '2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06', '2017-07-01 00:00:00', '2017-08-01 00:00:00', '2017-09-01 00:00:00', '2017-10-01 00:00:00', '2017-11-01 00:00:00', '2017-12-01 00:00:00', '2018-01-01 00:00:00', '2018-02-01 00:00:00', '2018-03-01 00:00:00', '2018-04-01 00:00:00', '2018-05-01 00:00:00', '2018-06-01 00:00:00', '2018-07-01 00:00:00', '2018-08-01 00:00:00', '2018-09-01 00:00:00', '2018-10-01 00:00:00', '2018-11-01 00:00:00', '2018-12-01 00:00:00', '2019-01-01 00:00:00', '2019-02-01 00:00:00', '2019-03-01 00:00:00', 'Current Month Cost', 'neighbourhood_group_cleansed']
final_df['accommodates']
0 4.566372 1 4.517241 2 4.281250 3 4.344086 4 4.263158 5 5.447368 6 4.666667 7 4.821429 8 4.281250 9 4.750000 10 4.357143 11 3.500000 12 4.555556 13 5.500000 14 2.000000 15 4.451613 16 4.266129 17 3.983333 18 4.391304 19 4.166667 20 4.000000 Name: accommodates, dtype: float64
# Function to Create Barplots based on the Column Name Input
def barplots_with_columnnames(columnlist, title):
data=[]
columns= columnlist
# This loops over the column list which is specified while using the function and create a trace for each of them
for i in columns:
trace= go.Bar(x= final_df.zipcode.astype(str).tolist(),
y= final_df[i],
name=i)
data.append(trace)
# In the Layout, xaxis type='category' will convert the numerical Zipcodes in Xaxis as Categories
layout=go.Layout(title='Bar Plot of %s'%(title) , xaxis=dict(type='category'), yaxis= dict( title='Days Available'))
fig= go.Figure(data=data, layout=layout)
iplot(fig)
barplots_with_columnnames(['availability_30','availability_60','availability_90','availability_365'], 'Availability')
# Using the scatterplot function create above to plot the graph between availability and revenue
scatterplot('availability_90','Revenue', 'Availability-Revenue Analysis')
# Using the scatterplot function create above to plot the graph between availability and Cost
scatterplot('availability_90','Current Month Cost', 'Availability-Cost Analysis')
final_df.head()
zipcode | id | scrape_id | host_id | host_listings_count | host_total_listings_count | latitude | longitude | accommodates | bathrooms | ... | 2018-08-01 00:00:00 | 2018-09-01 00:00:00 | 2018-10-01 00:00:00 | 2018-11-01 00:00:00 | 2018-12-01 00:00:00 | 2019-01-01 00:00:00 | 2019-02-01 00:00:00 | 2019-03-01 00:00:00 | Current Month Cost | neighbourhood_group_cleansed | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10003 | 9.124661e+06 | 2.017050e+13 | 2.496037e+07 | 1.646018 | 1.646018 | 40.728902 | -73.988258 | 4.566372 | 1.225664 | ... | 2.291593e+06 | 2.301921e+06 | 2.312249e+06 | 2.322578e+06 | 2.332906e+06 | 2.343234e+06 | 2.353562e+06 | 2.363890e+06 | 2.374218e+06 | Manhattan |
1 | 10011 | 9.889107e+06 | 2.017050e+13 | 3.005599e+07 | 1.448276 | 1.448276 | 40.742371 | -73.999752 | 4.517241 | 1.337209 | ... | 2.653717e+06 | 2.666097e+06 | 2.678476e+06 | 2.690856e+06 | 2.703236e+06 | 2.715616e+06 | 2.727996e+06 | 2.740375e+06 | 2.752755e+06 | Manhattan |
2 | 10013 | 1.054726e+07 | 2.017050e+13 | 2.783653e+07 | 1.412698 | 1.412698 | 40.719555 | -74.004075 | 4.281250 | 1.359375 | ... | 3.481322e+06 | 3.493096e+06 | 3.504869e+06 | 3.516642e+06 | 3.528415e+06 | 3.540188e+06 | 3.551961e+06 | 3.563734e+06 | 3.575507e+06 | Manhattan |
3 | 10014 | 8.654217e+06 | 2.017050e+13 | 2.054620e+07 | 1.537634 | 1.537634 | 40.734209 | -74.004240 | 4.344086 | 1.182796 | ... | 2.655164e+06 | 2.666847e+06 | 2.678530e+06 | 2.690213e+06 | 2.701897e+06 | 2.713580e+06 | 2.725263e+06 | 2.736946e+06 | 2.748629e+06 | Manhattan |
4 | 10021 | 1.022891e+07 | 2.017050e+13 | 3.568867e+07 | 1.526316 | 1.526316 | 40.768441 | -73.956628 | 4.263158 | 1.052632 | ... | 1.918115e+06 | 1.925437e+06 | 1.932760e+06 | 1.940082e+06 | 1.947404e+06 | 1.954727e+06 | 1.962049e+06 | 1.969372e+06 | 1.976694e+06 | Manhattan |
5 rows × 174 columns
# For plotting the Correlation, a function to normalize columns was created
def normalize(dataset):
return((dataset- min(dataset))/ (max(dataset)- min(dataset)))
Since all the variables are in different units, it will be essential to conver them to a range between 0-1 and then compare
# Normalizing the Revenue, Cost and Review:
normalized_revenue= normalize(final_df['Revenue'])
normalized_cost= normalize(final_df['Current Month Cost'])
normalized_review= normalize(final_df['review_scores_rating'])
normalized_availability= normalize(final_df['availability_90'])
scatter_df = pd.concat([normalized_revenue, normalized_cost, normalized_review,normalized_availability], axis=1)
# plotting the Scatterplot Matrix
import plotly.figure_factory as ff
# Here the Index is arranged so that the scale on the side starts from 1 and ends at 21 instead of the default (0 to 21)
scatter_df['index'] = np.arange(1,len(scatter_df)+1)
fig = ff.create_scatterplotmatrix(scatter_df, diag ='box', index = 'index' , colormap ='Portland',
colormap_type = 'seq', height = 900, width = 900)
iplot(fig)
# Function to Create Violin plots
def violinplot(columnname):
data=[]
neighbours=['Manhattan', 'Brooklyn', 'Staten Island', 'Queens']
# Loop to create 4 traces to plot 4 violin plots each for each neighbour.
for i in neighbours:
trace = { "type": 'violin',
"y": final_df[final_df['neighbourhood_group_cleansed']==i][columnname],
"name": i,
"box": {
"visible": True
},
"meanline": {
"visible": True
}
}
data.append(trace)
fig = {
"data": data,
"layout" : {
"title": "Violin plot of %s" %columnname,
"yaxis": {
"zeroline": False,
}
}
}
iplot(fig)
# Violin Plots of Revenue:
violinplot('Revenue')
# Violin plots of Cost:
violinplot('Current Month Cost')
# Scatterplot between Revenue and Cost and using the function create above to plot the graph
scatterplot('Revenue','Current Month Cost', 'Revenue-Cost Analysis')
final_df.head()
zipcode | id | scrape_id | host_id | host_listings_count | host_total_listings_count | latitude | longitude | accommodates | bathrooms | ... | 2018-08-01 00:00:00 | 2018-09-01 00:00:00 | 2018-10-01 00:00:00 | 2018-11-01 00:00:00 | 2018-12-01 00:00:00 | 2019-01-01 00:00:00 | 2019-02-01 00:00:00 | 2019-03-01 00:00:00 | Current Month Cost | neighbourhood_group_cleansed | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10003 | 9.124661e+06 | 2.017050e+13 | 2.496037e+07 | 1.646018 | 1.646018 | 40.728902 | -73.988258 | 4.566372 | 1.225664 | ... | 2.291593e+06 | 2.301921e+06 | 2.312249e+06 | 2.322578e+06 | 2.332906e+06 | 2.343234e+06 | 2.353562e+06 | 2.363890e+06 | 2.374218e+06 | Manhattan |
1 | 10011 | 9.889107e+06 | 2.017050e+13 | 3.005599e+07 | 1.448276 | 1.448276 | 40.742371 | -73.999752 | 4.517241 | 1.337209 | ... | 2.653717e+06 | 2.666097e+06 | 2.678476e+06 | 2.690856e+06 | 2.703236e+06 | 2.715616e+06 | 2.727996e+06 | 2.740375e+06 | 2.752755e+06 | Manhattan |
2 | 10013 | 1.054726e+07 | 2.017050e+13 | 2.783653e+07 | 1.412698 | 1.412698 | 40.719555 | -74.004075 | 4.281250 | 1.359375 | ... | 3.481322e+06 | 3.493096e+06 | 3.504869e+06 | 3.516642e+06 | 3.528415e+06 | 3.540188e+06 | 3.551961e+06 | 3.563734e+06 | 3.575507e+06 | Manhattan |
3 | 10014 | 8.654217e+06 | 2.017050e+13 | 2.054620e+07 | 1.537634 | 1.537634 | 40.734209 | -74.004240 | 4.344086 | 1.182796 | ... | 2.655164e+06 | 2.666847e+06 | 2.678530e+06 | 2.690213e+06 | 2.701897e+06 | 2.713580e+06 | 2.725263e+06 | 2.736946e+06 | 2.748629e+06 | Manhattan |
4 | 10021 | 1.022891e+07 | 2.017050e+13 | 3.568867e+07 | 1.526316 | 1.526316 | 40.768441 | -73.956628 | 4.263158 | 1.052632 | ... | 1.918115e+06 | 1.925437e+06 | 1.932760e+06 | 1.940082e+06 | 1.947404e+06 | 1.954727e+06 | 1.962049e+06 | 1.969372e+06 | 1.976694e+06 | Manhattan |
5 rows × 174 columns
cols(final_df)
['zipcode', 'id', 'scrape_id', 'host_id', 'host_listings_count', 'host_total_listings_count', 'latitude', 'longitude', 'accommodates', 'bathrooms', 'bedrooms', 'beds', 'Revenue', 'guests_included', 'minimum_nights', 'maximum_nights', 'availability_30', 'availability_60', 'availability_90', 'availability_365', 'number_of_reviews', 'review_scores_rating', 'review_scores_accuracy', 'calculated_host_listings_count', 'RegionID', 'City', 'State', 'Metro', 'CountyName', 'SizeRank', '2007-06', '2007-07', '2007-08', '2007-09', '2007-10', '2007-11', '2007-12', '2008-01', '2008-02', '2008-03', '2008-04', '2008-05', '2008-06', '2008-07', '2008-08', '2008-09', '2008-10', '2008-11', '2008-12', '2009-01', '2009-02', '2009-03', '2009-04', '2009-05', '2009-06', '2009-07', '2009-08', '2009-09', '2009-10', '2009-11', '2009-12', '2010-01', '2010-02', '2010-03', '2010-04', '2010-05', '2010-06', '2010-07', '2010-08', '2010-09', '2010-10', '2010-11', '2010-12', '2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06', '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12', '2012-01', '2012-02', '2012-03', '2012-04', '2012-05', '2012-06', '2012-07', '2012-08', '2012-09', '2012-10', '2012-11', '2012-12', '2013-01', '2013-02', '2013-03', '2013-04', '2013-05', '2013-06', '2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12', '2014-01', '2014-02', '2014-03', '2014-04', '2014-05', '2014-06', '2014-07', '2014-08', '2014-09', '2014-10', '2014-11', '2014-12', '2015-01', '2015-02', '2015-03', '2015-04', '2015-05', '2015-06', '2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12', '2016-01', '2016-02', '2016-03', '2016-04', '2016-05', '2016-06', '2016-07', '2016-08', '2016-09', '2016-10', '2016-11', '2016-12', '2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06', '2017-07-01 00:00:00', '2017-08-01 00:00:00', '2017-09-01 00:00:00', '2017-10-01 00:00:00', '2017-11-01 00:00:00', '2017-12-01 00:00:00', '2018-01-01 00:00:00', '2018-02-01 00:00:00', '2018-03-01 00:00:00', '2018-04-01 00:00:00', '2018-05-01 00:00:00', '2018-06-01 00:00:00', '2018-07-01 00:00:00', '2018-08-01 00:00:00', '2018-09-01 00:00:00', '2018-10-01 00:00:00', '2018-11-01 00:00:00', '2018-12-01 00:00:00', '2019-01-01 00:00:00', '2019-02-01 00:00:00', '2019-03-01 00:00:00', 'Current Month Cost', 'neighbourhood_group_cleansed']
# Function to create KPIs:
def breakeven_calc(dataset):
#Calculating all the KPIs for Breakeven Analysis
dataset['Expected Return']= dataset['Revenue'] * 365 * 0.75
dataset['Breakeven Period']= dataset['Current Month Cost']/dataset['Expected Return']
return dataset
final_df=breakeven_calc(final_df)
final_df
zipcode | id | scrape_id | host_id | host_listings_count | host_total_listings_count | latitude | longitude | accommodates | bathrooms | ... | 2018-10-01 00:00:00 | 2018-11-01 00:00:00 | 2018-12-01 00:00:00 | 2019-01-01 00:00:00 | 2019-02-01 00:00:00 | 2019-03-01 00:00:00 | Current Month Cost | neighbourhood_group_cleansed | Expected Return | Breakeven Period | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10003 | 9.124661e+06 | 2.017050e+13 | 2.496037e+07 | 1.646018 | 1.646018 | 40.728902 | -73.988258 | 4.566372 | 1.225664 | ... | 2.312249e+06 | 2.322578e+06 | 2.332906e+06 | 2.343234e+06 | 2.353562e+06 | 2.363890e+06 | 2.374218e+06 | Manhattan | 85085.376106 | 27.903949 |
1 | 10011 | 9.889107e+06 | 2.017050e+13 | 3.005599e+07 | 1.448276 | 1.448276 | 40.742371 | -73.999752 | 4.517241 | 1.337209 | ... | 2.678476e+06 | 2.690856e+06 | 2.703236e+06 | 2.715616e+06 | 2.727996e+06 | 2.740375e+06 | 2.752755e+06 | Manhattan | 99531.724138 | 27.657062 |
2 | 10013 | 1.054726e+07 | 2.017050e+13 | 2.783653e+07 | 1.412698 | 1.412698 | 40.719555 | -74.004075 | 4.281250 | 1.359375 | ... | 3.504869e+06 | 3.516642e+06 | 3.528415e+06 | 3.540188e+06 | 3.551961e+06 | 3.563734e+06 | 3.575507e+06 | Manhattan | 106719.726562 | 33.503710 |
3 | 10014 | 8.654217e+06 | 2.017050e+13 | 2.054620e+07 | 1.537634 | 1.537634 | 40.734209 | -74.004240 | 4.344086 | 1.182796 | ... | 2.678530e+06 | 2.690213e+06 | 2.701897e+06 | 2.713580e+06 | 2.725263e+06 | 2.736946e+06 | 2.748629e+06 | Manhattan | 89048.225806 | 30.866749 |
4 | 10021 | 1.022891e+07 | 2.017050e+13 | 3.568867e+07 | 1.526316 | 1.526316 | 40.768441 | -73.956628 | 4.263158 | 1.052632 | ... | 1.932760e+06 | 1.940082e+06 | 1.947404e+06 | 1.954727e+06 | 1.962049e+06 | 1.969372e+06 | 1.976694e+06 | Manhattan | 74964.276316 | 26.368485 |
5 | 10022 | 9.963782e+06 | 2.017050e+13 | 4.129974e+07 | 1.394737 | 1.394737 | 40.758886 | -73.966700 | 5.447368 | 1.276316 | ... | 2.158989e+06 | 2.166951e+06 | 2.174912e+06 | 2.182874e+06 | 2.190836e+06 | 2.198798e+06 | 2.206760e+06 | Manhattan | 100545.493421 | 21.947871 |
6 | 10023 | 8.424752e+06 | 2.017050e+13 | 3.053048e+07 | 2.888889 | 2.888889 | 40.777581 | -73.981486 | 4.666667 | 1.349206 | ... | 2.298776e+06 | 2.308556e+06 | 2.318336e+06 | 2.328116e+06 | 2.337896e+06 | 2.347675e+06 | 2.357455e+06 | Manhattan | 77936.190476 | 30.248528 |
7 | 10025 | 7.849437e+06 | 2.017050e+13 | 2.567103e+07 | 1.571429 | 1.571429 | 40.798025 | -73.966355 | 4.821429 | 1.232143 | ... | 1.542047e+06 | 1.548988e+06 | 1.555928e+06 | 1.562869e+06 | 1.569809e+06 | 1.576749e+06 | 1.583690e+06 | Manhattan | 78292.500000 | 20.227862 |
8 | 10028 | 1.038785e+07 | 2.017050e+13 | 3.790560e+07 | 1.375000 | 1.375000 | 40.775367 | -73.950934 | 4.281250 | 1.258065 | ... | 2.256107e+06 | 2.266870e+06 | 2.277633e+06 | 2.288396e+06 | 2.299158e+06 | 2.309921e+06 | 2.320684e+06 | Manhattan | 74040.820312 | 31.343308 |
9 | 10036 | 8.401201e+06 | 2.017050e+13 | 3.853206e+07 | 73.500000 | 73.500000 | 40.754810 | -73.987824 | 4.750000 | 1.416667 | ... | 1.798167e+06 | 1.803497e+06 | 1.808826e+06 | 1.814155e+06 | 1.819484e+06 | 1.824813e+06 | 1.830143e+06 | Manhattan | 122001.250000 | 15.001016 |
10 | 10128 | 1.079017e+07 | 2.017050e+13 | 2.984589e+07 | 1.321429 | 1.321429 | 40.781842 | -73.949537 | 4.357143 | 1.232143 | ... | 2.958255e+06 | 3.038049e+06 | 3.118619e+06 | 3.199966e+06 | 3.282089e+06 | 3.364987e+06 | 3.448662e+06 | Manhattan | 61696.406250 | 55.897300 |
11 | 10304 | 1.590640e+06 | 2.017050e+13 | 7.534770e+06 | 1.000000 | 1.000000 | 40.626721 | -74.076721 | 3.500000 | 1.250000 | ... | 4.502538e+05 | 4.585099e+05 | 4.668406e+05 | 4.752459e+05 | 4.837257e+05 | 4.922802e+05 | 5.009092e+05 | Staten Island | 26553.750000 | 18.863973 |
12 | 10305 | 1.368595e+07 | 2.017050e+13 | 5.535834e+07 | 1.555556 | 1.555556 | 40.596360 | -74.077371 | 4.555556 | 1.111111 | ... | 5.355818e+05 | 5.431920e+05 | 5.508852e+05 | 5.586614e+05 | 5.665205e+05 | 5.744625e+05 | 5.824875e+05 | Staten Island | 30447.083333 | 19.131143 |
13 | 10306 | 8.328929e+06 | 2.017050e+13 | 5.237069e+07 | 2.000000 | 2.000000 | 40.568879 | -74.111565 | 5.500000 | 1.250000 | ... | 4.434636e+05 | 4.496841e+05 | 4.559705e+05 | 4.623227e+05 | 4.687409e+05 | 4.752250e+05 | 4.817750e+05 | Staten Island | 25458.750000 | 18.923749 |
14 | 10312 | 1.800434e+07 | 2.017050e+13 | 1.197988e+08 | 1.000000 | 1.000000 | 40.546331 | -74.183701 | 2.000000 | 2.000000 | ... | 4.128157e+05 | 4.167345e+05 | 4.206890e+05 | 4.246794e+05 | 4.287059e+05 | 4.327680e+05 | 4.368654e+05 | Staten Island | 58856.250000 | 7.422584 |
15 | 11201 | 9.298929e+06 | 2.017050e+13 | 2.327364e+07 | 1.451613 | 1.451613 | 40.692611 | -73.990576 | 4.451613 | 1.250000 | ... | 1.542570e+06 | 1.550187e+06 | 1.557803e+06 | 1.565420e+06 | 1.573037e+06 | 1.580654e+06 | 1.588271e+06 | Brooklyn | 57112.197581 | 27.809660 |
16 | 11215 | 9.035422e+06 | 2.017050e+13 | 2.460656e+07 | 1.422764 | 1.422764 | 40.666241 | -73.984527 | 4.266129 | 1.189516 | ... | 1.168148e+06 | 1.174233e+06 | 1.180317e+06 | 1.186401e+06 | 1.192485e+06 | 1.198570e+06 | 1.204654e+06 | Brooklyn | 48641.401210 | 24.766020 |
17 | 11217 | 6.866520e+06 | 2.017050e+13 | 1.988437e+07 | 1.283333 | 1.283333 | 40.681377 | -73.979068 | 3.983333 | 1.228814 | ... | 1.756264e+06 | 1.787109e+06 | 1.818245e+06 | 1.849673e+06 | 1.881391e+06 | 1.913400e+06 | 1.945700e+06 | Brooklyn | 55603.187500 | 34.992598 |
18 | 11231 | 7.697324e+06 | 2.017050e+13 | 2.171201e+07 | 1.550725 | 1.550725 | 40.680632 | -74.000049 | 4.391304 | 1.176471 | ... | 1.312851e+06 | 1.319722e+06 | 1.326594e+06 | 1.333466e+06 | 1.340338e+06 | 1.347210e+06 | 1.354082e+06 | Brooklyn | 54896.793478 | 24.665958 |
19 | 11234 | 1.066553e+07 | 2.017050e+13 | 2.752097e+07 | 1.333333 | 1.333333 | 40.623074 | -73.925943 | 4.166667 | 1.083333 | ... | 4.656135e+05 | 4.644211e+05 | 4.631714e+05 | 4.618644e+05 | 4.605001e+05 | 4.590786e+05 | 4.575997e+05 | Brooklyn | 29382.500000 | 15.573886 |
20 | 11434 | 9.374263e+06 | 2.017050e+13 | 4.863844e+07 | 1.000000 | 1.000000 | 40.662033 | -73.769800 | 4.000000 | 1.000000 | ... | 4.264390e+05 | 4.299051e+05 | 4.334544e+05 | 4.370870e+05 | 4.408028e+05 | 4.446019e+05 | 4.484842e+05 | Queens | 42431.250000 | 10.569666 |
21 rows × 176 columns
# Function to Create Barplots based on the Column Name Input
def barplots_with_neighbourhoods(xaxis, yaxis, title):
data=[]
neighbours=['Manhattan', 'Brooklyn', 'Staten Island', 'Queens']
for i in neighbours:
trace= go.Bar(x= final_df[final_df['neighbourhood_group_cleansed']==i][xaxis],
y= final_df[final_df['neighbourhood_group_cleansed']==i][yaxis],
name=i)
data.append(trace)
# In the Layout, xaxis type='category' will convert the numerical Zipcodes in Xaxis as Categories
layout=go.Layout(title='Bar Plot of %s'%(title) , xaxis=dict(type='category'), yaxis= dict( title='Breakeven Period(in years)'))
fig= go.Figure(data=data, layout=layout)
iplot(fig)
# Using the above created function to plot the bar chart
barplots_with_neighbourhoods('zipcode','Breakeven Period', 'Breakeven Period for Zipcodes')
# Function to Create a 3-D Scatter plot:s
def threedscatterplot(xaxis, yaxis, zaxis):
data=[]
neighbours=['Manhattan', 'Brooklyn', 'Staten Island', 'Queens']
for i in neighbours:
trace = go.Scatter3d(
x=final_df[final_df['neighbourhood_group_cleansed']==i][xaxis],
y=final_df[final_df['neighbourhood_group_cleansed']==i][yaxis],
z=final_df[final_df['neighbourhood_group_cleansed']==i][zaxis],
mode='markers',
name=i,
marker=dict(
size=10),
text= final_df[final_df['neighbourhood_group_cleansed']==i]['zipcode']
)
data.append(trace)
layout= go.Layout(title='3D ScatterPlot Between X= %s, y= %s, and z= %s' %(xaxis,yaxis,zaxis),
hovermode= 'closest')
fig = go.Figure(data=data, layout=layout)
iplot(fig)
# threedscatterplot(xaxis, yaxis, zaxis)
threedscatterplot('Revenue', 'Current Month Cost', 'Breakeven Period')