Getting started

Once you've chosen your scenario, download the data from the Iowa website in csv format. Start by loading the data with pandas. You may need to parse the date columns appropriately.

In [1]:
import pandas as pd

#sales = pd.read_csv("iowa_liquor_sales_proj_2.csv")
sales = pd.read_csv("Iowa_Liquor_sales_sample_10pct.csv")
## Load the data into a DataFrame
# pd.read_csv()

## Transform the dates if needed, e.g.
# df["Date"] = pd.to_datetime(df["Date"], format="%m-%d-%y")

Explore the data

Perform some exploratory statistical analysis and make some plots, such as histograms of transaction totals, bottles sold, etc.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

plt.style.use('fivethirtyeight')
# plt.style.use('ggplot')
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import sklearn.linear_model as skl
import statsmodels.api as sm
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score as roc_auc
from sklearn.model_selection import train_test_split
from matplotlib.colors import ListedColormap
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
C:\ProgramData\Anaconda3\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools
In [3]:
def eda_tool(df):

    import pandas as pd
    
    dict_list = []
    for col in df.columns:
        data = df[col]
        dict_ = {}
        # The null count for a column. Columns with no nulls are generally more interesting
        dict_.update({"null_count" : data.isnull().sum()})
        # Counting the unique values in a column
        # This is useful for seeing how interesting the column might be as a feature
        dict_.update({"unique_count" : len(data.unique())})
        # Finding the types of data in the column
        # This is useful for finding out potential problems with a column having strings and ints
        dict_.update({"data_type" : set([type(d).__name__ for d in data])})
        #dict_.update({"score" : match[1]})
        dict_list.append(dict_)
    eda_df = pd.DataFrame(dict_list)
    eda_df.index = df.columns
    eda_df = eda_df.sort_values(['null_count','unique_count'], ascending=[True, False])

    print("dataframe shape \n", df.shape, '\n')
    print("dataframe index \n", df.index[:5], '\n')
    print("dataframe describe \n", df.describe())

    return eda_df
In [4]:
eda_tool(sales)
dataframe shape 
 (270955, 18) 

dataframe index 
 RangeIndex(start=0, stop=5, step=1) 

dataframe describe 
         Store Number  County Number      Category  Vendor Number  \
count  270955.000000  269878.000000  2.708870e+05   270955.00000   
mean     3590.263701      57.231642  1.043888e+06      256.43443   
std       947.662050      27.341205  5.018211e+04      141.01489   
min      2106.000000       1.000000  1.011100e+06       10.00000   
25%      2604.000000      31.000000  1.012200e+06      115.00000   
50%      3722.000000      62.000000  1.031200e+06      260.00000   
75%      4378.000000      77.000000  1.062310e+06      380.00000   
max      9023.000000      99.000000  1.701100e+06      978.00000   

         Item Number  Bottle Volume (ml)   Bottles Sold  Volume Sold (Liters)  \
count  270955.000000       270955.000000  270955.000000         270955.000000   
mean    45974.963300          924.830341       9.871285              8.981351   
std     52757.043086          493.088489      24.040912             28.913690   
min       168.000000           50.000000       1.000000              0.100000   
25%     26827.000000          750.000000       2.000000              1.500000   
50%     38176.000000          750.000000       6.000000              5.250000   
75%     64573.000000         1000.000000      12.000000             10.500000   
max    995507.000000         6000.000000    2508.000000           2508.000000   

       Volume Sold (Gallons)  
count          270955.000000  
mean                2.372830  
std                 7.638182  
min                 0.030000  
25%                 0.400000  
50%                 1.390000  
75%                 2.770000  
max               662.540000  
Out[4]:
data_type null_count unique_count
Sale (Dollars) {str} 0 6580
Item Number {int64} 0 2696
Item Description {str} 0 2173
Store Number {int64} 0 1400
State Bottle Retail {str} 0 1112
State Bottle Cost {str} 0 1086
Zip Code {str} 0 415
City {str} 0 385
Date {str} 0 274
Volume Sold (Liters) {float64} 0 265
Volume Sold (Gallons) {float64} 0 261
Bottles Sold {int64} 0 137
Vendor Number {int64} 0 116
Bottle Volume (ml) {int64} 0 29
Category {float64} 68 84
Category Name {str, float} 632 72
County Number {float64} 1077 100
County {str, float} 1077 100
In [5]:
sales.isnull().sum()
Out[5]:
Invoice/Item Number          0
Date                         0
Store Number                 0
Store Name                   0
Address                      0
City                         0
Zip Code                     0
Store Location               0
County Number            10913
County                   10913
Category                   779
Category Name             6109
Vendor Number                0
Vendor Name                  0
Item Number                  0
Item Description             0
Pack                         0
Bottle Volume (ml)           0
State Bottle Cost            0
State Bottle Retail          0
Bottles Sold                 0
Sale (Dollars)               0
Volume Sold (Liters)         0
Volume Sold (Gallons)        0
dtype: int64

There are around 10,000 null values, as this data frame is 2.7 million rows I am dropping the null rows

In [5]:
sales = sales.dropna()
In [6]:
sales.duplicated().sum()
Out[6]:
35

There are no duplicated rows

In [7]:
sales.columns
Out[7]:
Index(['Date', 'Store Number', 'City', 'Zip Code', 'County Number', 'County',
       'Category', 'Category Name', 'Vendor Number', 'Item Number',
       'Item Description', 'Bottle Volume (ml)', 'State Bottle Cost',
       'State Bottle Retail', 'Bottles Sold', 'Sale (Dollars)',
       'Volume Sold (Liters)', 'Volume Sold (Gallons)'],
      dtype='object')

Cleaning up the column names

In [8]:
sales.columns = map(str.lower, sales.columns)
sales.columns = sales.columns.str.replace(r"[()]", "")
sales.columns = sales.columns.str.replace(r"[ ]", "_")
sales.columns = sales.columns.str.replace(r"[/]", "_")

Cleaning the columns and switching them to the proper data type

In [9]:
for col in sales.select_dtypes([np.object]):
    sales[col] = sales[col].str.lstrip('$')
In [10]:
sales["state_bottle_retail"] = sales["state_bottle_retail"].astype(float)
sales["sale_dollars"] = sales["sale_dollars"].astype(float)

Exploring the requirements of the project (Build models of total sales based on location, price per bottle, total bottles sold. You may find it useful to build models for each county, ZIP code, or city.) I will explore the following categories: Total Sales, County, Zip Code, City, Retail Price per bottle, Total Bottles Sold.

In [11]:
sales.columns
Out[11]:
Index(['date', 'store_number', 'city', 'zip_code', 'county_number', 'county',
       'category', 'category_name', 'vendor_number', 'item_number',
       'item_description', 'bottle_volume_ml', 'state_bottle_cost',
       'state_bottle_retail', 'bottles_sold', 'sale_dollars',
       'volume_sold_liters', 'volume_sold_gallons'],
      dtype='object')
In [12]:
#Counting Total Sales
sales["sale_dollars"].value_counts()
Out[12]:
162.00      3468
148.56      2536
64.80       2066
94.20       2006
70.56       1889
90.00       1772
60.12       1711
73.80       1626
62.28       1624
117.00      1594
60.72       1556
188.88      1556
64.56       1548
180.00      1516
135.00      1511
45.00       1446
126.00      1433
30.00       1363
270.00      1311
161.64      1281
99.00       1260
72.00       1246
81.00       1228
124.20      1182
132.78      1052
58.50       1051
22.50       1004
24.76        992
67.26        982
40.50        979
            ... 
139.44         1
36392.40       1
90.90          1
45.70          1
71.05          1
4500.00        1
1398.42        1
575.82         1
584.94         1
314.64         1
114.06         1
2052.48        1
6285.96        1
291.12         1
1137.24        1
967.20         1
661.50         1
547.20         1
327.12         1
7927.20        1
230.94         1
537.60         1
753.72         1
7198.50        1
180.45         1
8.46           1
109.04         1
1079.04        1
1107.12        1
127.20         1
Name: sale_dollars, Length: 6552, dtype: int64
In [13]:
#Counting individual Counties
sales["county"].value_counts()
Out[13]:
Polk             48944
Linn             23462
Scott            16630
Black Hawk       15030
Johnson          13163
Pottawattamie     9088
Story             8944
Woodbury          8541
Dubuque           7739
Cerro Gordo       6360
Des Moines        4082
Muscatine         3975
Clinton           3569
Wapello           3522
Dickinson         3409
Lee               3319
Webster           3144
Marshall          2984
Jasper            2828
Buena Vista       2737
Dallas            2707
Marion            2601
Warren            2460
Bremer            2240
Boone             2105
Poweshiek         2087
Clay              1917
Carroll           1911
Jones             1871
O'Brien           1720
                 ...  
Greene             675
Wright             671
Shelby             661
Ida                634
Howard             606
Humboldt           588
Adair              584
Grundy             566
Pocahontas         525
Mills              507
Louisa             484
Lucas              475
Chickasaw          464
Guthrie            437
Calhoun            424
Butler             402
Worth              387
Hancock            363
Monroe             352
Osceola            351
Keokuk             343
Taylor             298
Van Buren          245
Adams              234
Audubon            227
Decatur            223
Davis              203
Ringgold           201
Wayne              160
Fremont             27
Name: county, Length: 99, dtype: int64
In [14]:
#Counting individual zip codes
sales["zip_code"].value_counts()
Out[14]:
50010    7077
52402    6938
52240    6128
50613    5267
52001    4755
51501    4652
50314    4519
50317    4425
50265    4356
52404    4242
50401    4119
52722    3699
52807    3530
52405    3502
52241    3446
52761    3389
50311    3384
51503    3382
50320    3237
52501    3206
50702    3175
50315    3091
52804    2973
50501    2972
52601    2952
50703    2885
50322    2880
50266    2843
52732    2822
50158    2682
         ... 
52625      17
50514      17
50044      16
50541      16
52623      15
50542      15
51053      15
50261      15
51466      14
51002      14
50071      13
50251      12
50830      12
50150      11
52223      11
51005      11
51038      10
52337       9
51553       9
50452       8
51338       7
50162       6
50540       6
50061       6
50634       6
51535       5
51453       3
51530       3
52801       2
52328       2
Name: zip_code, Length: 412, dtype: int64
In [15]:
#Counting individual Cities
sales["city"].value_counts()
Out[15]:
DES MOINES         23618
CEDAR RAPIDS       18736
DAVENPORT          11469
WATERLOO            8376
COUNCIL BLUFFS      8037
IOWA CITY           7938
SIOUX CITY          7888
AMES                7534
WEST DES MOINES     7148
DUBUQUE             6854
CEDAR FALLS         5719
ANKENY              4823
MASON CITY          4119
BETTENDORF          3699
CORALVILLE          3446
MUSCATINE           3389
BURLINGTON          3137
CLINTON             3077
FORT DODGE          2972
WINDSOR HEIGHTS     2797
MARSHALLTOWN        2682
NEWTON              2538
STORM LAKE          2522
MARION              2485
URBANDALE           2424
OTTUMWA             2290
JOHNSTON            2137
ALTOONA             2103
CLEAR LAKE          2080
SPENCER             1910
                   ...  
ARMSTRONG             17
DONNELLSON            17
GILMORE CITY          16
BUSSEY                16
GOLDFIELD             15
SCHALLER              15
DANVILLE              15
VAN METER             15
WASHBURN              15
WALL LAKE             14
ALTA                  14
DOWS                  13
AFTON                 12
SULLY                 12
AURELIA               11
LOVILIA               11
DELHI                 11
MERRILL               10
STANWOOD               9
MINDEN                 9
LATIMER                8
EVERLY                 7
Cumming                6
MELBOURNE              6
GILBERTVILLE           6
FONDA                  6
GRISWOLD               5
LOHRVILLE              3
ROBINS                 2
Carroll                1
Name: city, Length: 382, dtype: int64
In [16]:
#Counting individual bottle retail price
sales["state_bottle_retail"].value_counts()
Out[16]:
12.38     6110
9.75      4783
13.50     4415
15.00     4222
22.50     3814
15.74     3809
10.76     2991
11.21     2944
7.50      2796
9.45      2777
10.50     2761
7.85      2618
17.24     2498
10.35     2473
11.24     2342
5.01      2334
11.43     2282
10.38     2227
12.30     2162
13.47     2142
5.06      2122
18.75     2116
16.50     2063
27.74     2051
7.13      2024
10.80     1928
5.25      1911
8.25      1869
27.00     1855
15.75     1740
          ... 
19.00        1
8.75         1
42.11        1
76.50        1
109.82       1
93.03        1
75.00        1
12.45        1
42.75        1
6.90         1
57.08        1
103.41       1
149.97       1
16.44        1
23.70        1
19.38        1
286.86       1
9.27         1
27.95        1
22.02        1
1.98         1
17.22        1
20.01        1
78.70        1
21.54        1
111.32       1
17.36        1
48.26        1
180.33       1
32.33        1
Name: state_bottle_retail, Length: 1104, dtype: int64
In [17]:
#Counting individual bottles sold
sales["bottles_sold"].value_counts()
Out[17]:
12      72607
6       51887
2       37321
1       31058
3       27758
4       15051
24      14798
48       3765
5        3189
36       2115
18       1734
10       1383
8        1227
60       1098
30        591
72        488
7         397
120       379
96        302
84        200
9         168
144       151
15        117
180       108
42        107
240       107
300       100
90         89
150        86
44         66
        ...  
1116        1
157         1
372         1
2400        1
88          1
354         1
594         1
81          1
615         1
504         1
588         1
75          1
840         1
1128        1
624         1
57          1
282         1
1080        1
378         1
564         1
816         1
390         1
396         1
39          1
37          1
97          1
402         1
28          1
1050        1
33          1
Name: bottles_sold, Length: 136, dtype: int64
In [18]:
#Most sales are under 1,000 dollars
pd.DataFrame.hist(sales, column='sale_dollars', bins=100)
plt.xlabel('Sale Dollars')
plt.ylabel('Number of Sales')
Out[18]:
<matplotlib.text.Text at 0x2246e493d30>
In [19]:
#Most bottles sold are under quantity of 100
pd.DataFrame.hist(sales, column='bottles_sold', bins=100)
plt.xlabel('Sale Dollars')
plt.ylabel('Number of Bottles')
Out[19]:
<matplotlib.text.Text at 0x2246e6b09e8>

Mine the data

Now you are ready to compute the variables you will use for your regression from the data. For example, you may want to compute total sales per store from Jan to March of 2015, mean price per bottle, etc. Refer to the readme for more ideas appropriate to your scenario.

Pandas is your friend for this task. Take a look at the operations here for ideas on how to make the best use of pandas and feel free to search for blog and Stack Overflow posts to help you group data by certain variables and compute sums, means, etc. You may find it useful to create a new data frame to house this summary data.

In [20]:
variables = sales[['state_bottle_retail', 'bottles_sold', 'sale_dollars']]
variables.head()
Out[20]:
state_bottle_retail bottles_sold sale_dollars
0 6.75 12 81.00
1 20.63 2 41.26
2 18.89 24 453.36
3 14.25 6 85.50
4 10.80 12 129.60
In [21]:
dummy_counties = pd.get_dummies(sales.county)
dummy_counties.head()
Out[21]:
Adair Adams Allamakee Appanoose Audubon Benton Black Hawk Boone Bremer Buchanan ... Wapello Warren Washington Wayne Webster Winnebago Winneshiek Woodbury Worth Wright
0 0 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1

5 rows × 99 columns

In [22]:
dummy_city = pd.get_dummies(sales.city)
dummy_city.head()
Out[22]:
ACKLEY ADAIR ADEL AFTON AKRON ALBIA ALDEN ALGONA ALLISON ALTA ... WEST UNION WHEATLAND WILLIAMSBURG WILTON WINDSOR HEIGHTS WINTERSET WINTHROP WOODBINE WOODWARD ZWINGLE
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 382 columns

In [ ]:
 
In [23]:
dummy_zip = pd.get_dummies(sales.zip_code)
dummy_zip.head()
Out[23]:
50002 50003 50006 50009 50010 50014 50020 50021 50022 50023 ... 52777 52778 52801 52802 52803 52804 52806 52807 56201 712-2
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 412 columns

Refine the data

Look for any statistical relationships, correlations, or other relevant properties of the dataset.

In [24]:
sns.heatmap(variables.corr(), annot = True, linewidths = 0.5)
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x2246d949240>

There appears to be a correlation between bottles sold and sale dollars

Build your models

Using scikit-learn or statsmodels, build the necessary models for your scenario. Evaluate model fit.

I am going to start prepping my data to go into the model. I will need to drop polk county since in the eda I found that it was the largest contributor to counties and since I added dummy variables I need to remove the most represented county then add a constant.

In [25]:
dummy_counties.columns
Out[25]:
Index(['Adair', 'Adams', 'Allamakee', 'Appanoose', 'Audubon', 'Benton',
       'Black Hawk', 'Boone', 'Bremer', 'Buchanan', 'Buena Vista', 'Butler',
       'Calhoun', 'Carroll', 'Cass', 'Cedar', 'Cerro Gordo', 'Cherokee',
       'Chickasaw', 'Clarke', 'Clay', 'Clayton', 'Clinton', 'Crawford',
       'Dallas', 'Davis', 'Decatur', 'Delaware', 'Des Moines', 'Dickinson',
       'Dubuque', 'Emmet', 'Fayette', 'Floyd', 'Franklin', 'Fremont', 'Greene',
       'Grundy', 'Guthrie', 'Hamilton', 'Hancock', 'Hardin', 'Harrison',
       'Henry', 'Howard', 'Humboldt', 'Ida', 'Iowa', 'Jackson', 'Jasper',
       'Jefferson', 'Johnson', 'Jones', 'Keokuk', 'Kossuth', 'Lee', 'Linn',
       'Louisa', 'Lucas', 'Lyon', 'Madison', 'Mahaska', 'Marion', 'Marshall',
       'Mills', 'Mitchell', 'Monona', 'Monroe', 'Montgomery', 'Muscatine',
       'O'Brien', 'Osceola', 'Page', 'Palo Alto', 'Plymouth', 'Pocahontas',
       'Polk', 'Pottawattamie', 'Poweshiek', 'Ringgold', 'Sac', 'Scott',
       'Shelby', 'Sioux', 'Story', 'Tama', 'Taylor', 'Union', 'Van Buren',
       'Wapello', 'Warren', 'Washington', 'Wayne', 'Webster', 'Winnebago',
       'Winneshiek', 'Woodbury', 'Worth', 'Wright'],
      dtype='object')
In [26]:
dummy_counties = dummy_counties.drop('Polk', axis=1)
variables = variables.drop('sale_dollars', axis=1)
In [27]:
sales_county = pd.concat([variables, dummy_counties], axis=1)
sales_county.head()
Out[27]:
state_bottle_retail bottles_sold Adair Adams Allamakee Appanoose Audubon Benton Black Hawk Boone ... Wapello Warren Washington Wayne Webster Winnebago Winneshiek Woodbury Worth Wright
0 6.75 12 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 20.63 2 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 18.89 24 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0
3 14.25 6 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 10.80 12 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1

5 rows × 100 columns

In [28]:
dummy_city = dummy_city.drop('DES MOINES', axis=1)
dummy_zip = dummy_zip.drop('50010', axis=1)
In [29]:
sales_city = pd.concat([variables, dummy_city], axis=1)
sales_city.head()
Out[29]:
state_bottle_retail bottles_sold ACKLEY ADAIR ADEL AFTON AKRON ALBIA ALDEN ALGONA ... WEST UNION WHEATLAND WILLIAMSBURG WILTON WINDSOR HEIGHTS WINTERSET WINTHROP WOODBINE WOODWARD ZWINGLE
0 6.75 12 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 20.63 2 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 18.89 24 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 14.25 6 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 10.80 12 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 383 columns

In [30]:
sales_zip = pd.concat([variables, dummy_zip], axis=1)
sales_zip.head()
Out[30]:
state_bottle_retail bottles_sold 50002 50003 50006 50009 50014 50020 50021 50022 ... 52777 52778 52801 52802 52803 52804 52806 52807 56201 712-2
0 6.75 12 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 20.63 2 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0
2 18.89 24 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 14.25 6 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 10.80 12 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 413 columns

In [58]:
y=sales["sale_dollars"]
sales_county = sm.add_constant(sales_county)
X_county = sales_county
sales_city = sm.add_constant(sales_city)
X_city = sales_city
sales_zip = sm.add_constant(sales_zip)
X_zip = sales_zip
In [ ]:
 
In [59]:
# Note the argument order
model_county = sm.OLS(y, X_county).fit() ## sm.OLS(output, input)
predictions_county = model_county.predict(X_county)
In [60]:
# Note the argument order
model_city = sm.OLS(y, X_city).fit() ## sm.OLS(output, input)
predictions_city = model_city.predict(X_city)
In [61]:
# Note the argument order
model_zip = sm.OLS(y, X_zip).fit() ## sm.OLS(output, input)
predictions_zip = model_zip.predict(X_zip)
In [62]:
# County statistics
model_county.summary()
Out[62]:
OLS Regression Results
Dep. Variable: sale_dollars R-squared: 0.718
Model: OLS Adj. R-squared: 0.718
Method: Least Squares F-statistic: 6854.
Date: Sun, 22 Oct 2017 Prob (F-statistic): 0.00
Time: 22:31:25 Log-Likelihood: -1.8135e+06
No. Observations: 269258 AIC: 3.627e+06
Df Residuals: 269157 BIC: 3.628e+06
Df Model: 100
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const -103.7964 1.105 -93.925 0.000 -105.962 -101.630
state_bottle_retail 6.8393 0.037 183.278 0.000 6.766 6.912
bottles_sold 13.3495 0.016 813.993 0.000 13.317 13.382
Adair -3.0402 8.478 -0.359 0.720 -19.657 13.577
Adams -1.8369 13.347 -0.138 0.891 -27.996 24.322
Allamakee 0.9446 6.371 0.148 0.882 -11.542 13.431
Appanoose -4.2531 6.383 -0.666 0.505 -16.763 8.257
Audubon 1.0510 13.550 0.078 0.938 -25.506 27.608
Benton 1.6322 6.578 0.248 0.804 -11.261 14.525
Black Hawk -17.3436 1.900 -9.129 0.000 -21.067 -13.620
Boone 5.4154 4.534 1.194 0.232 -3.472 14.302
Bremer 3.3418 4.401 0.759 0.448 -5.285 11.969
Buchanan 1.5694 5.172 0.303 0.762 -8.567 11.706
Buena Vista -2.2199 4.002 -0.555 0.579 -10.063 5.623
Butler 4.1962 10.200 0.411 0.681 -15.796 24.188
Calhoun 8.1722 9.935 0.823 0.411 -11.299 27.644
Carroll 18.1781 4.749 3.828 0.000 8.870 27.486
Cass 2.6457 5.976 0.443 0.658 -9.066 14.358
Cedar 7.5866 6.345 1.196 0.232 -4.850 20.023
Cerro Gordo -4.8350 2.715 -1.781 0.075 -10.157 0.487
Cherokee -12.0638 6.939 -1.739 0.082 -25.664 1.536
Chickasaw 7.7258 9.500 0.813 0.416 -10.894 26.346
Clarke 12.8217 7.786 1.647 0.100 -2.439 28.082
Clay 1.3912 4.743 0.293 0.769 -7.904 10.687
Clayton 0.9740 5.812 0.168 0.867 -10.418 12.366
Clinton -0.7251 3.532 -0.205 0.837 -7.648 6.198
Crawford 3.9955 6.160 0.649 0.517 -8.079 16.070
Dallas 53.1908 4.021 13.227 0.000 45.309 61.073
Davis 4.6525 14.325 0.325 0.745 -23.424 32.729
Decatur -2.2049 13.670 -0.161 0.872 -28.998 24.588
Delaware 3.1519 7.758 0.406 0.685 -12.054 18.358
Des Moines -5.8023 3.318 -1.749 0.080 -12.306 0.702
Dickinson 5.0831 3.608 1.409 0.159 -1.988 12.154
Dubuque 2.6042 2.492 1.045 0.296 -2.279 7.488
Emmet -0.1118 7.309 -0.015 0.988 -14.438 14.215
Fayette 2.7125 6.097 0.445 0.656 -9.237 14.662
Floyd 1.2147 6.506 0.187 0.852 -11.537 13.966
Franklin -3.3577 7.732 -0.434 0.664 -18.511 11.796
Fremont -30.1678 39.207 -0.769 0.442 -107.013 46.677
Greene 2.6740 7.893 0.339 0.735 -12.797 18.145
Grundy 4.1345 8.611 0.480 0.631 -12.743 21.012
Guthrie 0.1955 9.787 0.020 0.984 -18.986 19.377
Hamilton 3.8102 6.121 0.622 0.534 -8.187 15.808
Hancock -8.2983 10.730 -0.773 0.439 -29.328 12.732
Hardin 8.7106 5.090 1.711 0.087 -1.267 18.688
Harrison 4.3795 6.214 0.705 0.481 -7.799 16.558
Henry 8.5531 6.210 1.377 0.168 -3.618 20.724
Howard 12.8337 8.325 1.542 0.123 -3.482 29.150
Humboldt -1.5956 8.450 -0.189 0.850 -18.157 14.966
Ida 5.6097 8.141 0.689 0.491 -10.347 21.566
Iowa 13.5064 5.634 2.397 0.017 2.464 24.549
Jackson 4.9213 5.199 0.947 0.344 -5.268 15.111
Jasper 1.2888 3.940 0.327 0.744 -6.434 9.012
Jefferson -5.1507 7.229 -0.713 0.476 -19.318 9.017
Johnson 6.7503 2.000 3.376 0.001 2.831 10.670
Jones 7.3444 4.799 1.530 0.126 -2.062 16.751
Keokuk 5.7851 11.037 0.524 0.600 -15.846 27.416
Kossuth 2.0699 5.061 0.409 0.683 -7.850 11.989
Lee 5.3702 3.653 1.470 0.142 -1.790 12.530
Linn 0.4465 1.618 0.276 0.783 -2.724 3.617
Louisa -36.1366 9.304 -3.884 0.000 -54.372 -17.901
Lucas 5.4799 9.390 0.584 0.560 -12.925 23.885
Lyon 0.9456 6.028 0.157 0.875 -10.870 12.761
Madison -1.6523 6.774 -0.244 0.807 -14.930 11.625
Mahaska -4.9971 5.981 -0.836 0.403 -16.719 6.725
Marion 5.0274 4.099 1.226 0.220 -3.007 13.062
Marshall 1.8498 3.841 0.482 0.630 -5.678 9.378
Mills -20.2372 9.092 -2.226 0.026 -38.057 -2.417
Mitchell 0.5647 6.476 0.087 0.931 -12.127 13.257
Monona -0.3446 5.824 -0.059 0.953 -11.759 11.070
Monroe 5.6772 10.895 0.521 0.602 -15.676 27.031
Montgomery -4.7861 7.189 -0.666 0.506 -18.877 9.305
Muscatine 3.4120 3.360 1.015 0.310 -3.174 9.998
O'Brien 8.7803 4.997 1.757 0.079 -1.014 18.575
Osceola 2.5360 10.910 0.232 0.816 -18.848 23.920
Page 5.1122 5.998 0.852 0.394 -6.644 16.868
Palo Alto 0.0944 6.258 0.015 0.988 -12.171 12.360
Plymouth 3.0856 5.232 0.590 0.555 -7.169 13.340
Pocahontas 4.5471 8.937 0.509 0.611 -12.969 22.063
Pottawattamie 7.6726 2.327 3.297 0.001 3.112 12.233
Poweshiek 4.0197 4.553 0.883 0.377 -4.905 12.944
Ringgold 12.4488 14.395 0.865 0.387 -15.766 40.663
Sac -4.5672 6.874 -0.664 0.506 -18.041 8.906
Scott -11.6747 1.828 -6.386 0.000 -15.258 -8.091
Shelby 11.7838 7.975 1.478 0.140 -3.847 27.415
Sioux 18.1440 5.664 3.203 0.001 7.042 29.246
Story 3.9777 2.343 1.698 0.090 -0.614 8.569
Tama 0.9065 6.562 0.138 0.890 -11.955 13.768
Taylor 3.8606 11.835 0.326 0.744 -19.335 27.057
Union 5.7824 6.286 0.920 0.358 -6.538 18.102
Van Buren 10.2299 13.045 0.784 0.433 -15.338 35.797
Wapello 9.0755 3.555 2.553 0.011 2.108 16.043
Warren -0.0940 4.209 -0.022 0.982 -8.343 8.155
Washington -2.8096 5.265 -0.534 0.594 -13.128 7.509
Wayne -8.7035 16.128 -0.540 0.589 -40.314 22.907
Webster -5.2554 3.747 -1.402 0.161 -12.600 2.089
Winnebago -3.8053 6.669 -0.571 0.568 -16.876 9.265
Winneshiek -1.1398 5.713 -0.200 0.842 -12.337 10.057
Woodbury 0.3815 2.388 0.160 0.873 -4.300 5.063
Worth 8.8589 10.394 0.852 0.394 -11.514 29.231
Wright -0.8212 7.916 -0.104 0.917 -16.337 14.695
Omnibus: 366597.951 Durbin-Watson: 2.001
Prob(Omnibus): 0.000 Jarque-Bera (JB): 66620643079.357
Skew: 5.938 Prob(JB): 0.00
Kurtosis: 2439.804 Cond. No. 2.68e+03
In [63]:
# City Statistics
model_city.summary()
Out[63]:
OLS Regression Results
Dep. Variable: sale_dollars R-squared: 0.718
Model: OLS Adj. R-squared: 0.718
Method: Least Squares F-statistic: 1791.
Date: Sun, 22 Oct 2017 Prob (F-statistic): 0.00
Time: 22:32:18 Log-Likelihood: -1.8133e+06
No. Observations: 269258 AIC: 3.627e+06
Df Residuals: 268874 BIC: 3.631e+06
Df Model: 383
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const -111.2714 1.465 -75.970 0.000 -114.142 -108.401
state_bottle_retail 6.8100 0.037 181.877 0.000 6.737 6.883
bottles_sold 13.3641 0.016 812.038 0.000 13.332 13.396
ACKLEY 12.6694 21.388 0.592 0.554 -29.250 54.589
ADAIR -16.2146 31.082 -0.522 0.602 -77.134 44.705
ADEL 21.9562 11.496 1.910 0.056 -0.576 44.488
AFTON 9.7649 58.798 0.166 0.868 -105.478 125.007
AKRON 7.8456 24.550 0.320 0.749 -40.273 55.964
ALBIA 13.9177 11.107 1.253 0.210 -7.852 35.688
ALDEN 4.5391 18.484 0.246 0.806 -31.689 40.768
ALGONA 10.4988 6.497 1.616 0.106 -2.236 23.233
ALLISON 16.4433 22.391 0.734 0.463 -27.443 60.329
ALTA -16.2747 54.439 -0.299 0.765 -122.973 90.424
ALTOONA 9.2230 4.634 1.990 0.047 0.140 18.306
AMES 12.1338 2.696 4.501 0.000 6.850 17.418
ANAMOSA 16.3067 8.538 1.910 0.056 -0.427 33.040
ANITA 10.1520 16.791 0.605 0.545 -22.759 43.063
ANKENY 17.5795 3.219 5.461 0.000 11.270 23.889
ANTHON 13.1279 15.957 0.823 0.411 -18.147 44.403
ARLINGTON 28.7329 23.397 1.228 0.219 -17.125 74.591
ARMSTRONG -28.7348 49.406 -0.582 0.561 -125.568 68.099
ARNOLD'S PARK 1.3154 13.158 0.100 0.920 -24.475 27.106
ARNOLDS PARK 11.8707 9.231 1.286 0.198 -6.221 29.963
ATLANTIC 10.5251 6.462 1.629 0.103 -2.139 23.190
AUDUBON 12.3132 14.284 0.862 0.389 -15.684 40.310
AURELIA -6.9803 61.411 -0.114 0.910 -127.344 113.384
AVOCA 11.3464 9.618 1.180 0.238 -7.505 30.197
BALDWIN -19.7364 39.957 -0.494 0.621 -98.052 58.579
BANCROFT 8.9044 10.655 0.836 0.403 -11.980 29.789
BAXTER 20.9895 26.772 0.784 0.433 -31.482 73.461
BEDFORD 17.5580 16.306 1.077 0.282 -14.402 49.518
BELLE PLAINE 4.5213 10.096 0.448 0.654 -15.267 24.309
BELLEVUE 7.7428 13.266 0.584 0.559 -18.258 33.744
BELMOND 0.3506 15.194 0.023 0.982 -29.429 30.130
BETTENDORF 16.6460 3.602 4.621 0.000 9.586 23.706
BEVINGTON 14.0133 23.709 0.591 0.554 -32.456 60.483
BLOOMFIELD 12.4578 14.354 0.868 0.385 -15.676 40.592
BLUE GRASS 23.7859 9.969 2.386 0.017 4.247 43.325
BONDURANT 18.6969 10.772 1.736 0.083 -2.415 39.809
BOONE 14.2969 5.528 2.586 0.010 3.462 25.132
BRITT 6.4683 16.306 0.397 0.692 -25.491 38.427
BROOKLYN 8.1753 14.681 0.557 0.578 -20.598 36.949
BUFFALO -17.0115 33.965 -0.501 0.616 -83.581 49.558
BUFFALO CENTER 10.2996 16.625 0.620 0.536 -22.285 42.884
BURLINGTON -0.6583 3.870 -0.170 0.865 -8.244 6.928
BUSSEY -140.3028 50.926 -2.755 0.006 -240.116 -40.489
CAMANCHE -9.6778 17.323 -0.559 0.576 -43.630 24.274
CAMBRIDGE 21.5206 28.546 0.754 0.451 -34.428 77.469
CARLISLE 7.7813 15.194 0.512 0.609 -21.998 37.561
CARROLL 30.8795 5.463 5.652 0.000 20.171 41.588
CARTER LAKE -45.0795 15.909 -2.834 0.005 -76.260 -13.899
CASCADE 14.8590 16.679 0.891 0.373 -17.832 47.550
CASEY -17.8852 33.503 -0.534 0.593 -83.550 47.780
CEDAR FALLS 8.2666 3.003 2.753 0.006 2.381 14.152
CEDAR RAPIDS 8.5375 1.994 4.282 0.000 4.630 12.445
CENTER POINT 18.8264 15.451 1.218 0.223 -11.457 49.110
CENTERVILLE 3.2231 6.662 0.484 0.629 -9.835 16.281
CENTRAL CITY -7.2243 17.774 -0.406 0.684 -42.061 27.612
CHARITON 13.2592 9.437 1.405 0.160 -5.238 31.756
CHARLES CITY 6.2152 7.217 0.861 0.389 -7.929 20.360
CHEROKEE -6.3219 7.370 -0.858 0.391 -20.767 8.123
CLARINDA 9.7266 9.115 1.067 0.286 -8.138 27.591
CLARION 8.2250 15.861 0.519 0.604 -22.862 39.312
CLARKSVILLE 9.4002 25.101 0.375 0.708 -39.796 58.597
CLEAR LAKE 8.4477 4.659 1.813 0.070 -0.684 17.579
CLINTON 6.7116 3.904 1.719 0.086 -0.940 14.363
CLIVE -1.0321 6.798 -0.152 0.879 -14.356 12.291
COLFAX 12.9306 14.756 0.876 0.381 -15.991 41.853
COLO -7.7096 39.211 -0.197 0.844 -84.562 69.143
COLUMBUS JUNCTION -56.5840 12.031 -4.703 0.000 -80.165 -33.003
CONRAD 10.9844 20.509 0.536 0.592 -29.213 51.182
COON RAPIDS 2.6825 18.793 0.143 0.886 -34.151 39.516
CORALVILLE 31.6748 3.714 8.528 0.000 24.395 38.954
CORNING 9.1684 9.756 0.940 0.347 -9.953 28.289
CORWITH 23.9953 37.201 0.645 0.519 -48.918 96.909
CORYDON 1.7810 17.909 0.099 0.921 -33.320 36.883
COUNCIL BLUFFS 17.3579 2.631 6.599 0.000 12.202 22.514
CRESCENT 8.6032 39.957 0.215 0.830 -69.712 86.918
CRESCO 19.7904 8.722 2.269 0.023 2.696 36.885
CRESTON 13.6301 6.391 2.133 0.033 1.103 26.157
Carroll 244.1771 203.635 1.199 0.230 -154.941 643.295
Cumming -127.8339 83.144 -1.538 0.124 -290.793 35.125
DAKOTA CITY -89.7085 40.748 -2.202 0.028 -169.574 -9.843
DANVILLE -15.6578 52.594 -0.298 0.766 -118.741 87.425
DAVENPORT -12.5572 2.318 -5.417 0.000 -17.101 -8.014
DAYTON 18.7383 21.158 0.886 0.376 -22.730 60.206
DE SOTO 7.0988 17.774 0.399 0.690 -27.737 41.935
DECORAH 4.9740 6.163 0.807 0.420 -7.105 17.053
DELAWARE 0.4450 46.735 0.010 0.992 -91.154 92.044
DELHI 21.3423 61.412 0.348 0.728 -99.023 141.707
DELMAR -6.1927 37.837 -0.164 0.870 -80.351 67.966
DENISON 13.6400 6.879 1.983 0.047 0.157 27.123
DENVER 13.7529 18.335 0.750 0.453 -22.183 49.689
DEWITT 23.1845 14.424 1.607 0.108 -5.086 51.455
DONNELLSON 10.5607 49.405 0.214 0.831 -86.272 107.394
DOWS -9.0705 56.493 -0.161 0.872 -119.794 101.653
DUBUQUE 9.8759 2.795 3.534 0.000 4.398 15.353
DUMONT 11.5202 25.896 0.445 0.656 -39.234 62.275
DUNLAP 8.8208 15.030 0.587 0.557 -20.638 38.280
DURANT 17.2065 13.026 1.321 0.187 -8.324 42.737
DYERSVILLE 17.7488 8.953 1.982 0.047 0.202 35.296
DYSART 6.3149 33.503 0.188 0.850 -59.350 71.980
Des Moines -7.9060 20.719 -0.382 0.703 -48.514 32.702
Dubuque -11.0442 26.106 -0.423 0.672 -62.211 40.123
EAGLE GROVE 14.9995 11.910 1.259 0.208 -8.344 38.343
EARLHAM 11.3089 33.503 0.338 0.736 -54.356 76.974
EARLY 5.7175 36.597 0.156 0.876 -66.012 77.447
EDDYVILLE 1.9253 36.022 0.053 0.957 -68.676 72.526
EDGEWOOD 18.5049 21.999 0.841 0.400 -24.612 61.621
ELDON -7.9854 32.634 -0.245 0.807 -71.947 55.976
ELDORA 14.5795 11.532 1.264 0.206 -8.023 37.182
ELDRIDGE 0.8908 7.855 0.113 0.910 -14.504 16.285
ELKADER 13.2149 18.560 0.712 0.476 -23.162 49.591
ELLSWORTH 21.6375 21.272 1.017 0.309 -20.055 63.330
ELMA 30.0380 29.422 1.021 0.307 -27.627 87.703
ELY 0.4269 31.082 0.014 0.989 -60.492 61.346
EMMETSBURG 7.5235 6.775 1.111 0.267 -5.754 20.801
ESTHERVILLE 8.5041 7.449 1.142 0.254 -6.095 23.104
EVANSDALE -10.0774 9.370 -1.075 0.282 -28.443 8.288
EVERLY 4.7498 76.976 0.062 0.951 -146.122 155.621
EXIRA -23.9271 43.434 -0.551 0.582 -109.057 61.203
FAIRBANK 5.2703 21.998 0.240 0.811 -37.846 48.387
FAIRFAX -6.9754 33.965 -0.205 0.837 -73.545 59.595
FAIRFIELD 2.6637 7.290 0.365 0.715 -11.625 16.952
FARLEY -6.3658 37.837 -0.168 0.866 -80.525 67.794
FARMINGTON 12.0277 36.022 0.334 0.738 -58.574 82.629
FAYETTE -4.8607 24.550 -0.198 0.843 -52.978 43.257
FLOYD 26.8015 17.773 1.508 0.132 -8.034 61.637
FONDA -16.0151 83.142 -0.193 0.847 -178.972 146.942
FONTANELLE 5.4821 25.896 0.212 0.832 -45.273 56.237
FOREST CITY 2.5327 8.460 0.299 0.765 -14.048 19.114
FORT ATKINSON 18.6753 16.154 1.156 0.248 -12.985 50.336
FORT DODGE 6.7896 3.964 1.713 0.087 -0.979 14.558
FORT MADISON 4.3457 5.635 0.771 0.441 -6.698 15.390
FREDERICKSBURG 20.6748 23.095 0.895 0.371 -24.591 65.941
GARNER -11.0150 15.407 -0.715 0.475 -41.211 19.181
GEORGE 14.2376 18.637 0.764 0.445 -22.290 50.765
GILBERTVILLE 25.6181 83.142 0.308 0.758 -137.339 188.575
GILMORE CITY -33.7885 50.925 -0.663 0.507 -133.601 66.024
GLADBROOK 22.4710 23.709 0.948 0.343 -23.999 68.941
GLENWOOD -2.3770 10.889 -0.218 0.827 -23.719 18.965
GLIDDEN 16.4634 28.003 0.588 0.557 -38.421 71.348
GOLDFIELD -73.1030 52.594 -1.390 0.165 -176.186 29.980
GOWRIE 2.4441 14.533 0.168 0.866 -26.040 30.928
GRAETTINGER 11.3875 20.719 0.550 0.583 -29.220 51.995
GRAND JUNCTION -23.8708 38.505 -0.620 0.535 -99.340 51.599
GRAND MOUND 11.7441 25.489 0.461 0.645 -38.213 61.701
GRANGER 2.4954 25.896 0.096 0.923 -48.259 53.250
GREENE 10.9579 19.823 0.553 0.580 -27.896 49.811
GREENFIELD 4.3470 14.680 0.296 0.767 -24.426 33.120
GRIMES 10.5076 5.365 1.958 0.050 -0.008 21.023
GRINNELL 12.1846 5.451 2.235 0.025 1.500 22.869
GRISWOLD 2.1828 91.076 0.024 0.981 -176.323 180.689
GRUNDY CENTER 14.6975 10.109 1.454 0.146 -5.116 34.510
GUTHRIE CENTER 17.6120 17.386 1.013 0.311 -16.463 51.687
GUTTENBERG 4.9218 12.897 0.382 0.703 -20.355 30.199
GUTTENBURG 20.4662 20.406 1.003 0.316 -19.530 60.462
HAMBURG -22.5931 39.211 -0.576 0.564 -99.446 54.260
HAMPTON 5.1508 7.985 0.645 0.519 -10.499 20.801
HARLAN 19.5010 8.031 2.428 0.015 3.761 35.241
HARPERS FERRY -21.1814 29.421 -0.720 0.472 -78.847 36.484
HARTLEY 21.5626 11.891 1.813 0.070 -1.744 44.869
HAWARDEN 6.4730 15.539 0.417 0.677 -23.982 36.929
HAZLETON -138.5507 36.598 -3.786 0.000 -210.281 -66.820
HIAWATHA -24.0826 10.358 -2.325 0.020 -44.384 -3.781
HOLSTEIN 12.6384 11.461 1.103 0.270 -9.825 35.102
HOLY CROSS 17.5875 26.322 0.668 0.504 -34.003 69.178
HOSPERS 23.5566 39.211 0.601 0.548 -53.297 100.410
HUBBARD 21.4838 23.244 0.924 0.355 -24.075 67.042
HUDSON 5.3643 32.224 0.166 0.868 -57.794 68.523
HUMBOLDT 11.7142 8.808 1.330 0.184 -5.548 28.977
HUMESTON -12.5408 37.202 -0.337 0.736 -85.455 60.374
HUXLEY 13.0762 18.714 0.699 0.485 -23.603 49.755
IDA GROVE 14.2222 11.568 1.229 0.219 -8.450 36.895
INDEPENDENCE 16.3160 6.928 2.355 0.019 2.738 29.894
INDIANOLA 3.8883 5.030 0.773 0.439 -5.970 13.747
INWOOD 15.1291 13.436 1.126 0.260 -11.204 41.463
IOWA CITY 8.3816 2.642 3.172 0.002 3.203 13.560
IOWA FALLS 18.5881 7.904 2.352 0.019 3.097 34.080
IRETON 15.8366 33.964 0.466 0.641 -50.733 82.406
Inwood 21.0536 36.597 0.575 0.565 -50.676 92.783
JEFFERSON 12.2832 8.272 1.485 0.138 -3.930 28.497
JESUP 9.8436 15.195 0.648 0.517 -19.937 39.625
JEWELL 22.1007 21.999 1.005 0.315 -21.017 65.218
JOHNSTON 16.1114 4.602 3.501 0.000 7.091 25.132
KELLOG 10.5823 45.553 0.232 0.816 -78.700 99.865
KELLOGG 5.9998 22.665 0.265 0.791 -38.423 50.423
KEOKUK 20.4061 5.105 3.997 0.000 10.400 30.412
KEOSAUQUA 18.9282 14.016 1.350 0.177 -8.543 46.399
KEOTA 4.9876 30.728 0.162 0.871 -55.238 65.213
KINGSLEY 1.6400 28.828 0.057 0.955 -54.863 58.143
KNOXVILLE 12.4646 5.619 2.218 0.027 1.452 23.477
LA PORTE CITY 16.4268 12.999 1.264 0.206 -9.052 41.905
LAKE CITY 10.7820 17.448 0.618 0.537 -23.416 44.980
LAKE MILLS 3.5394 14.250 0.248 0.804 -24.390 31.469
LAKE PARK 16.7145 35.472 0.471 0.637 -52.810 86.239
LAKE VIEW 0.8195 14.680 0.056 0.955 -27.953 29.592
LAMONI 12.2978 19.824 0.620 0.535 -26.556 51.151
LANSING 10.4263 16.680 0.625 0.532 -22.266 43.118
LARCHWOOD 8.2965 11.755 0.706 0.480 -14.742 31.336
LATIMER 17.8635 72.006 0.248 0.804 -123.267 158.994
LAURENS 10.1463 14.911 0.680 0.496 -19.079 39.372
LAWLER 44.4770 31.829 1.397 0.162 -17.908 106.862
LE CLAIRE 3.4458 12.397 0.278 0.781 -20.851 27.743
LE GRAND 15.9082 12.243 1.299 0.194 -8.088 39.904
LE MARS 4.6337 9.990 0.464 0.643 -14.945 24.213
LECLAIRE 34.4375 11.496 2.996 0.003 11.906 56.969
LEMARS 14.0501 6.988 2.011 0.044 0.353 27.747
LENOX 5.1152 17.201 0.297 0.766 -28.598 38.828
LEON -0.6040 18.872 -0.032 0.974 -37.593 36.386
LISBON 12.1819 17.323 0.703 0.482 -21.771 46.134
LOGAN 17.3734 20.109 0.864 0.388 -22.039 56.786
LOHRVILLE 26.8624 117.573 0.228 0.819 -203.578 257.303
LOST NATION 5.2824 35.472 0.149 0.882 -64.243 74.807
LOVILIA -1.3982 61.411 -0.023 0.982 -121.763 118.966
MADRID 15.4158 15.237 1.012 0.312 -14.447 45.279
MALVERN -47.8482 37.202 -1.286 0.198 -120.763 25.067
MANCHESTER 11.0398 7.984 1.383 0.167 -4.608 26.688
MANLY -18.0850 32.225 -0.561 0.575 -81.244 45.074
MANNING 9.6254 12.627 0.762 0.446 -15.124 34.375
MANSON 18.7842 15.584 1.205 0.228 -11.760 49.328
MAPLETON 9.3938 9.854 0.953 0.340 -9.920 28.708
MAQUOKETA 15.4257 5.869 2.628 0.009 3.922 26.929
MARCUS 16.8299 23.244 0.724 0.469 -28.728 62.388
MARENGO 27.6627 9.956 2.779 0.005 8.150 47.175
MARION 12.5049 4.296 2.911 0.004 4.086 20.924
MARSHALLTOWN 8.9378 4.150 2.154 0.031 0.804 17.072
MARTENSDALE -6.4502 45.552 -0.142 0.887 -95.732 82.831
MASON CITY -0.5532 3.439 -0.161 0.872 -7.294 6.188
MAXWELL 16.8655 39.957 0.422 0.673 -61.450 95.181
MECHANICSVILLE -42.9215 36.022 -1.192 0.233 -113.523 27.680
MEDIAPOLIS 5.8980 11.076 0.533 0.594 -15.811 27.607
MELBOURNE 18.2901 83.142 0.220 0.826 -144.666 181.246
MELCHER-DALLAS 6.9854 15.629 0.447 0.655 -23.646 37.617
MERRILL 1.0184 64.407 0.016 0.987 -125.218 127.255
MILFORD 15.0328 6.742 2.230 0.026 1.819 28.247
MINDEN 14.2626 67.890 0.210 0.834 -118.799 147.325
MISSOURI VALLEY 10.5286 8.062 1.306 0.192 -5.273 26.330
MONONA 3.1677 9.877 0.321 0.748 -16.191 22.526
MONROE 9.5374 18.560 0.514 0.607 -26.840 45.914
MONTEZUMA 11.2643 11.044 1.020 0.308 -10.381 32.910
MONTICELLO 14.6337 5.830 2.510 0.012 3.208 26.060
MONTROSE -14.5274 33.061 -0.439 0.660 -79.325 50.271
MORAVIA 7.9187 24.913 0.318 0.751 -40.911 56.748
MOUNT AYR 20.2394 14.425 1.403 0.161 -8.032 48.511
MOUNT PLEASANT 15.2371 6.815 2.236 0.025 1.879 28.595
MOUNT VERNON 8.7258 6.191 1.409 0.159 -3.409 20.860
MT PLEASANT 21.9054 15.583 1.406 0.160 -8.637 52.448
MT VERNON -35.2418 24.036 -1.466 0.143 -82.351 11.867
MUSCATINE 10.5240 3.742 2.812 0.005 3.189 17.859
NASHUA 11.7494 21.998 0.534 0.593 -31.367 54.866
NEOLA 23.4626 25.690 0.913 0.361 -26.889 73.814
NEVADA 9.0361 6.831 1.323 0.186 -4.353 22.425
NEW HAMPTON 10.6192 12.723 0.835 0.404 -14.317 35.556
NEW SHARON 11.7852 38.505 0.306 0.760 -63.684 87.255
NEW VIRGINIA 17.2334 34.446 0.500 0.617 -50.279 84.746
NEWTON 5.1702 4.256 1.215 0.224 -3.171 13.512
NORA SPRINGS 6.5939 30.727 0.215 0.830 -53.631 66.819
NORTH ENGLISH 20.1190 36.022 0.559 0.576 -50.484 90.721
NORTH LIBERTY -0.5936 5.789 -0.103 0.918 -11.940 10.753
NORTHWOOD 16.5553 13.919 1.189 0.234 -10.726 43.837
NORWALK 23.7449 9.628 2.466 0.014 4.875 42.615
Northwood 27.3407 17.841 1.532 0.125 -7.626 62.308
OAKLAND 3.9275 12.821 0.306 0.759 -21.202 29.057
OELWEIN 8.9083 8.398 1.061 0.289 -7.551 25.368
OGDEN 9.9022 18.409 0.538 0.591 -26.179 45.983
OKOBOJI 13.8150 32.224 0.429 0.668 -49.344 76.974
ONAWA 6.4833 7.236 0.896 0.370 -7.698 20.665
ORANGE CITY 31.1245 11.870 2.622 0.009 7.859 54.390
OSAGE 13.0410 7.368 1.770 0.077 -1.399 27.481
OSCEOLA 20.6072 7.843 2.627 0.009 5.235 35.980
OSKALOOSA 3.0222 5.400 0.560 0.576 -7.561 13.605
OTHO -21.9789 45.553 -0.482 0.629 -111.261 67.303
OTTUMWA 17.3559 4.459 3.893 0.000 8.617 26.095
OTTUWMA 19.6167 6.859 2.860 0.004 6.173 33.060
PACIFIC JUNCTION -33.2145 18.484 -1.797 0.072 -69.442 3.013
PALO -26.9541 33.060 -0.815 0.415 -91.750 37.842
PANORA 6.4980 12.651 0.514 0.607 -18.297 31.293
PARKERSBURG 11.0946 22.127 0.501 0.616 -32.273 54.462
PAULLINA -4.4437 18.048 -0.246 0.806 -39.817 30.929
PELLA 21.8905 7.074 3.095 0.002 8.027 35.754
PEOSTA 13.3592 27.244 0.490 0.624 -40.038 66.756
PERRY 11.9243 7.265 1.641 0.101 -2.315 26.164
PLEASANT HILL 0.2158 6.905 0.031 0.975 -13.319 13.750
PLEASANTVILLE -10.8642 16.005 -0.679 0.497 -42.234 20.506
POCAHONTAS 13.8810 11.891 1.167 0.243 -9.424 37.186
POLK CITY 16.2014 12.396 1.307 0.191 -8.094 40.497
POSTVILLE -1.3631 30.053 -0.045 0.964 -60.266 57.540
PRAIRIE CITY -15.9101 35.473 -0.449 0.654 -85.436 53.616
PRIMGHAR 5.7114 19.640 0.291 0.771 -32.782 44.205
PRINCETON 9.8593 32.635 0.302 0.763 -54.104 73.822
RAYMOND 0.4309 17.022 0.025 0.980 -32.931 33.793
RED OAK 3.0798 7.348 0.419 0.675 -11.322 17.482
REINBECK 5.7782 35.472 0.163 0.871 -63.747 75.303
REMSEN 15.5809 17.841 0.873 0.382 -19.387 50.549
RICEVILLE 13.8661 35.472 0.391 0.696 -55.659 83.391
RIVERSIDE 17.8288 13.731 1.298 0.194 -9.084 44.741
ROBINS 25.6121 143.994 0.178 0.859 -256.613 307.837
ROCK RAPIDS 3.8871 9.370 0.415 0.678 -14.478 22.252
ROCK VALLEY 5.8394 21.045 0.277 0.781 -35.408 47.087
ROCKWELL 21.8659 16.104 1.358 0.175 -9.698 53.429
ROCKWELL CITY 17.4914 19.288 0.907 0.364 -20.313 55.296
ROLFE 15.4339 34.948 0.442 0.659 -53.063 83.930
RUTHVEN 9.3762 30.053 0.312 0.755 -49.527 68.280
SAC CITY 3.5511 8.159 0.435 0.663 -12.441 19.543
SANBORN 22.6244 15.408 1.468 0.142 -7.574 52.823
SCHALLER 8.8142 52.594 0.168 0.867 -94.268 111.897
SCHLESWIG 3.5061 14.183 0.247 0.805 -24.292 31.304
SCRANTON 4.1263 40.748 0.101 0.919 -75.739 83.991
SERGEANT BLUFF 9.4991 10.586 0.897 0.370 -11.249 30.247
SHEFFIELD -16.7184 39.211 -0.426 0.670 -93.572 60.135
SHELDON 18.7288 6.652 2.815 0.005 5.691 31.767
SHELLSBURG 11.3929 11.306 1.008 0.314 -10.767 33.553
SHENANDOAH 15.2834 7.973 1.917 0.055 -0.342 30.909
SIBLEY 10.3781 10.950 0.948 0.343 -11.084 31.840
SIGOURNEY 14.8261 11.852 1.251 0.211 -8.403 38.056
SIOUX CENTER 31.8050 7.810 4.072 0.000 16.497 47.113
SIOUX CITY 7.8343 2.649 2.958 0.003 2.643 13.025
SIOUX RAPIDS 8.1861 14.425 0.567 0.570 -20.086 36.459
SLATER 7.9728 25.292 0.315 0.753 -41.600 57.545
SLOAN 16.8442 19.203 0.877 0.380 -20.792 54.481
SOLON 9.0008 15.628 0.576 0.565 -21.630 39.632
SPENCER 9.2005 4.846 1.899 0.058 -0.297 18.698
SPIRIT LAKE 13.5214 5.189 2.606 0.009 3.352 23.691
SPRINGVILLE 4.1382 39.211 0.106 0.916 -72.715 80.991
ST ANSGAR -12.1602 14.951 -0.813 0.416 -41.463 17.143
ST CHARLES 0.6915 38.506 0.018 0.986 -74.778 76.161
ST LUCAS 18.4430 27.006 0.683 0.495 -34.487 71.373
STANWOOD -86.2965 67.890 -1.271 0.204 -219.359 46.766
STATE CENTER 25.8882 15.676 1.652 0.099 -4.835 56.612
STORM LAKE 5.5457 4.268 1.299 0.194 -2.820 13.911
STORY CITY 13.1286 12.974 1.012 0.312 -12.300 38.557
STRATFORD 1.0744 22.949 0.047 0.963 -43.905 46.054
STRAWBERRY POINT 10.7485 26.771 0.402 0.688 -41.722 63.219
STUART 7.9827 12.135 0.658 0.511 -15.802 31.768
SULLY 1.1542 58.798 0.020 0.984 -114.088 116.397
SUMNER 8.0987 11.222 0.722 0.471 -13.897 30.094
SUTHERLAND -6.2071 34.445 -0.180 0.857 -73.719 61.305
SWEA CITY -0.6337 35.472 -0.018 0.986 -70.158 68.891
SWISHER 58.1100 13.131 4.425 0.000 32.373 83.847
TIFFIN 5.0404 17.774 0.284 0.777 -29.796 39.877
TIPTON 16.0052 9.177 1.744 0.081 -1.981 33.991
TOLEDO 11.9659 7.454 1.605 0.108 -2.643 26.575
TRAER -5.3410 15.450 -0.346 0.730 -35.623 24.942
TREYNOR -3.9643 41.587 -0.095 0.924 -85.474 77.545
TRIPOLI 15.1197 13.266 1.140 0.254 -10.882 41.122
URBANA -1.5880 37.837 -0.042 0.967 -75.747 72.571
URBANDALE 12.8917 4.344 2.968 0.003 4.378 21.405
Urbandale 3.6420 33.060 0.110 0.912 -61.154 68.438
VAN METER -84.4213 52.594 -1.605 0.108 -187.504 18.662
VICTOR 18.0625 25.489 0.709 0.479 -31.895 68.020
VILLISCA 1.0963 43.434 0.025 0.980 -84.034 86.227
VINTON 17.6707 14.250 1.240 0.215 -10.258 45.599
WALFORD 16.5690 37.201 0.445 0.656 -56.345 89.483
WALKER -16.2928 45.553 -0.358 0.721 -105.574 72.989
WALL LAKE 13.4953 54.439 0.248 0.804 -93.203 120.193
WALNUT 0.8355 28.270 0.030 0.976 -54.572 56.243
WAPELLO 13.6309 14.681 0.928 0.353 -15.143 42.405
WASHBURN -64.1395 52.594 -1.220 0.223 -167.222 38.943
WASHINGTON 2.8315 5.756 0.492 0.623 -8.450 14.113
WATERLOO -22.7595 2.590 -8.787 0.000 -27.836 -17.683
WAUKEE 18.2832 5.611 3.258 0.001 7.285 29.281
WAUKON 10.8178 7.321 1.478 0.140 -3.532 25.167
WAVERLY 10.9909 5.350 2.054 0.040 0.505 21.477
WEBSTER CITY 10.4118 7.008 1.486 0.137 -3.323 24.147
WELLMAN 7.5048 13.078 0.574 0.566 -18.128 33.137
WELLSBURG -29.4621 44.456 -0.663 0.508 -116.594 57.670
WESLEY 9.1705 17.261 0.531 0.595 -24.661 43.002
WEST BEND 11.4648 19.823 0.578 0.563 -27.388 50.317
WEST BRANCH 22.8898 12.652 1.809 0.070 -1.908 47.687
WEST BURLINGTON 14.0926 8.509 1.656 0.098 -2.584 30.770
WEST DES MOINES 30.2788 2.750 11.011 0.000 24.889 35.669
WEST LIBERTY 8.6568 12.419 0.697 0.486 -15.683 32.997
WEST POINT 17.8887 15.450 1.158 0.247 -12.393 48.171
WEST UNION 10.8489 11.172 0.971 0.332 -11.048 32.746
WHEATLAND 11.2630 39.957 0.282 0.778 -67.053 89.578
WILLIAMSBURG 16.4116 12.626 1.300 0.194 -8.336 41.159
WILTON 20.7052 11.570 1.790 0.074 -1.971 43.382
WINDSOR HEIGHTS 32.6892 4.072 8.028 0.000 24.708 40.670
WINTERSET 5.3052 7.402 0.717 0.474 -9.202 19.813
WINTHROP 15.6299 28.002 0.558 0.577 -39.254 70.514
WOODBINE 19.7643 16.411 1.204 0.228 -12.400 51.929
WOODWARD 11.6388 34.446 0.338 0.735 -55.874 79.151
ZWINGLE -15.5083 28.270 -0.549 0.583 -70.916 39.900
Omnibus: 364812.861 Durbin-Watson: 2.001
Prob(Omnibus): 0.000 Jarque-Bera (JB): 67102567296.771
Skew: 5.870 Prob(JB): 0.00
Kurtosis: 2448.603 Cond. No. 1.39e+04
In [64]:
# Zip Code Statistics
model_zip.summary()
Out[64]:
OLS Regression Results
Dep. Variable: sale_dollars R-squared: 0.719
Model: OLS Adj. R-squared: 0.719
Method: Least Squares F-statistic: 1670.
Date: Sun, 22 Oct 2017 Prob (F-statistic): 0.00
Time: 22:32:28 Log-Likelihood: -1.8128e+06
No. Observations: 269258 AIC: 3.626e+06
Df Residuals: 268844 BIC: 3.631e+06
Df Model: 413
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const -96.5788 2.494 -38.719 0.000 -101.468 -91.690
state_bottle_retail 6.7432 0.038 179.585 0.000 6.670 6.817
bottles_sold 13.3491 0.017 808.482 0.000 13.317 13.381
50002 -29.9087 31.092 -0.962 0.336 -90.848 31.030
50003 8.2845 11.652 0.711 0.477 -14.554 31.123
50006 -8.8378 18.561 -0.476 0.634 -45.216 27.541
50009 -4.3451 5.048 -0.861 0.389 -14.240 5.550
50014 -22.4837 9.811 -2.292 0.022 -41.712 -3.255
50020 -3.5390 16.882 -0.210 0.834 -36.628 29.550
50021 19.1865 4.715 4.069 0.000 9.945 28.428
50022 -3.0223 6.759 -0.447 0.655 -16.269 10.225
50023 -12.3539 4.878 -2.533 0.011 -21.914 -2.794
50025 -1.3936 14.401 -0.097 0.923 -29.619 26.832
50028 7.1533 26.799 0.267 0.790 -45.373 59.679
50033 0.1676 23.753 0.007 0.994 -46.387 46.722
50035 4.8937 10.940 0.447 0.655 -16.548 26.336
50036 0.6791 5.877 0.116 0.908 -10.839 12.197
50044 -154.1943 50.875 -3.031 0.002 -253.907 -54.481
50046 7.6630 28.565 0.268 0.788 -48.324 63.650
50047 -5.8497 15.301 -0.382 0.702 -35.839 24.140
50048 -31.6252 33.504 -0.944 0.345 -97.292 34.042
50049 -0.3689 9.634 -0.038 0.969 -19.252 18.514
50054 -0.8693 14.868 -0.058 0.953 -30.010 28.271
50056 -21.3511 39.193 -0.545 0.586 -98.168 55.465
50058 -10.8854 18.867 -0.577 0.564 -47.865 26.094
50060 -11.7605 17.990 -0.654 0.513 -47.021 23.500
50061 -141.0714 83.019 -1.699 0.089 -303.786 21.643
50069 -6.6335 17.856 -0.371 0.710 -41.631 28.364
50071 -22.8323 56.427 -0.405 0.686 -133.428 87.763
50072 -2.4535 33.504 -0.073 0.942 -68.120 63.213
50075 7.8137 21.330 0.366 0.714 -33.992 49.619
50076 -37.6038 43.403 -0.866 0.386 -122.673 47.466
50107 -37.4722 38.489 -0.974 0.330 -112.910 37.965
50109 -11.3427 25.928 -0.437 0.662 -62.161 39.475
50111 -3.0709 5.723 -0.537 0.592 -14.288 8.147
50112 -1.4448 5.804 -0.249 0.803 -12.820 9.930
50115 3.8363 17.471 0.220 0.826 -30.407 38.080
50122 7.6937 23.290 0.330 0.741 -37.954 53.342
50123 -26.4626 37.190 -0.712 0.477 -99.354 46.429
50124 -0.5241 18.789 -0.028 0.978 -37.350 36.302
50125 -9.7245 5.412 -1.797 0.072 -20.332 0.883
50126 5.1077 8.144 0.627 0.531 -10.855 21.070
50129 0.4383 7.387 0.059 0.953 -14.039 14.916
50130 8.1887 22.052 0.371 0.710 -35.033 51.410
50131 2.4712 5.018 0.493 0.622 -7.363 12.306
50135 -7.7712 22.714 -0.342 0.732 -52.290 36.748
50136 -3.3111 45.516 -0.073 0.942 -92.520 85.898
50138 -1.2468 5.961 -0.209 0.834 -12.930 10.436
50140 -1.5484 19.891 -0.078 0.938 -40.534 37.437
50142 2.1466 12.386 0.173 0.862 -22.130 26.423
50144 -14.2001 18.946 -0.749 0.454 -51.335 22.934
50150 -15.2315 61.334 -0.248 0.804 -135.444 104.981
50156 1.6367 15.342 0.107 0.915 -28.434 31.707
50158 -4.6139 4.609 -1.001 0.317 -13.648 4.420
50160 -20.0637 45.515 -0.441 0.659 -109.272 69.145
50161 3.1519 39.936 0.079 0.937 -75.122 81.426
50162 4.6876 83.017 0.056 0.955 -158.023 167.398
50163 -6.5949 15.731 -0.419 0.675 -37.426 24.237
50170 -4.2230 18.636 -0.227 0.821 -40.749 32.303
50171 -2.3062 11.207 -0.206 0.837 -24.272 19.659
50201 -4.6950 7.111 -0.660 0.509 -18.632 9.242
50207 -1.8600 38.489 -0.048 0.961 -77.297 73.577
50208 -4.7194 4.871 -0.969 0.333 -14.266 4.827
50210 3.4820 34.443 0.101 0.919 -64.025 70.989
50211 10.1313 9.821 1.032 0.302 -9.118 29.380
50212 -3.7959 18.486 -0.205 0.837 -40.029 32.437
50213 7.0022 8.085 0.866 0.386 -8.845 22.849
50216 -7.1645 12.788 -0.560 0.575 -32.229 17.900
50219 8.3708 7.344 1.140 0.254 -6.023 22.765
50220 -1.7786 7.528 -0.236 0.813 -16.533 12.976
50225 -24.5009 16.103 -1.521 0.128 -56.063 7.061
50226 2.5729 12.537 0.205 0.837 -22.000 27.145
50228 -29.7824 35.467 -0.840 0.401 -99.296 39.731
50240 -13.0806 38.489 -0.340 0.734 -88.519 62.357
50244 -5.7635 25.327 -0.228 0.820 -55.405 43.877
50247 12.0218 15.777 0.762 0.446 -18.900 42.944
50248 1.2860 11.399 0.113 0.910 -21.056 23.628
50249 -12.3985 22.996 -0.539 0.590 -57.471 32.674
50250 -5.7329 12.281 -0.467 0.641 -29.803 18.337
50251 -12.4684 58.727 -0.212 0.832 -127.572 102.635
50261 -98.1211 52.538 -1.868 0.062 -201.095 4.853
50263 4.7477 5.954 0.797 0.425 -6.923 16.418
50265 -1.8658 3.914 -0.477 0.634 -9.538 5.806
50266 45.0208 4.514 9.973 0.000 36.173 53.868
50273 -8.2800 7.660 -1.081 0.280 -23.293 6.733
50276 -2.1760 34.443 -0.063 0.950 -69.683 65.331
50300 -12.2810 16.663 -0.737 0.461 -44.940 20.378
50309 -102.6366 10.485 -9.789 0.000 -123.187 -82.086
50310 -31.7304 4.916 -6.454 0.000 -41.366 -22.095
50311 7.3928 4.249 1.740 0.082 -0.935 15.721
50312 -15.1329 6.365 -2.377 0.017 -27.609 -2.657
50313 -26.0639 7.411 -3.517 0.000 -40.590 -11.538
50314 5.2666 3.875 1.359 0.174 -2.329 12.862
50315 -23.8340 4.384 -5.437 0.000 -32.426 -15.242
50316 -60.5910 6.187 -9.793 0.000 -72.717 -48.465
50317 -25.3308 3.896 -6.501 0.000 -32.968 -17.694
50320 32.8486 4.323 7.599 0.000 24.376 41.321
50321 -1.1673 5.480 -0.213 0.831 -11.908 9.573
50322 -3.8985 4.493 -0.868 0.386 -12.704 4.907
50323 -2.7250 15.301 -0.178 0.859 -32.714 27.264
50324 -25.0191 15.262 -1.639 0.101 -54.932 4.894
50325 -14.1999 7.134 -1.990 0.047 -28.183 -0.217
50327 -4.3109 18.486 -0.233 0.816 -40.544 31.922
50401 -14.1112 3.984 -3.542 0.000 -21.919 -6.303
50421 -13.2058 15.300 -0.863 0.388 -43.194 16.783
50423 -7.2069 16.401 -0.439 0.660 -39.353 24.939
50424 -3.3914 16.717 -0.203 0.839 -36.157 29.374
50428 -5.5920 5.152 -1.085 0.278 -15.689 4.505
50430 10.3952 37.189 0.280 0.780 -62.494 83.285
50435 13.3039 17.856 0.745 0.456 -21.693 48.301
50436 -10.9910 8.683 -1.266 0.206 -28.009 6.027
50438 -24.6533 15.511 -1.589 0.112 -55.055 5.748
50441 -8.4437 8.222 -1.027 0.304 -24.558 7.671
50450 -10.1527 14.367 -0.707 0.480 -38.311 18.006
50452 4.2868 71.905 0.060 0.952 -136.645 145.218
50456 -31.9335 32.230 -0.991 0.322 -95.103 31.236
50458 -7.1546 30.738 -0.233 0.816 -67.401 53.092
50459 6.9788 11.176 0.624 0.532 -14.926 28.884
50461 -0.6935 7.626 -0.091 0.928 -15.640 14.253
50466 0.2075 35.466 0.006 0.995 -69.305 69.720
50469 8.0932 16.201 0.500 0.617 -23.661 39.847
50472 -25.5296 15.059 -1.695 0.090 -55.046 3.987
50475 -30.4922 39.193 -0.778 0.437 -107.309 46.325
50483 -4.1310 17.348 -0.238 0.812 -38.133 29.871
50501 -6.7320 4.443 -1.515 0.130 -15.441 1.977
50511 -3.0320 6.793 -0.446 0.655 -16.346 10.282
50514 -42.4734 49.358 -0.861 0.390 -139.214 54.267
50517 -4.5785 10.826 -0.423 0.672 -25.797 16.640
50525 -5.4497 15.960 -0.341 0.733 -36.732 25.832
50529 -103.5983 40.726 -2.544 0.011 -183.419 -23.777
50530 5.1168 21.216 0.241 0.809 -36.465 46.699
50533 1.4513 12.059 0.120 0.904 -22.184 25.087
50535 -8.0749 36.587 -0.221 0.825 -79.785 63.635
50536 -6.0549 7.057 -0.858 0.391 -19.886 7.776
50540 -29.3117 83.017 -0.353 0.724 -192.023 133.400
50541 -47.6681 50.874 -0.937 0.349 -147.379 52.043
50542 -86.8969 52.539 -1.654 0.098 -189.871 16.077
50543 -11.1202 14.646 -0.759 0.448 -39.826 17.586
50548 -1.8959 9.021 -0.210 0.834 -19.576 15.784
50554 -3.6233 15.021 -0.241 0.809 -33.063 25.817
50563 5.0879 15.686 0.324 0.746 -25.657 35.832
50569 -35.8643 45.516 -0.788 0.431 -125.074 53.345
50574 0.1913 12.040 0.016 0.987 -23.406 23.789
50579 3.5673 19.359 0.184 0.854 -34.376 41.511
50581 1.8230 34.943 0.052 0.958 -66.664 70.310
50583 -10.0252 8.390 -1.195 0.232 -26.470 6.420
50585 -5.4941 14.540 -0.378 0.706 -33.991 23.003
50588 -8.0050 4.714 -1.698 0.089 -17.245 1.235
50590 -14.1602 35.466 -0.399 0.690 -83.672 55.352
50595 -3.2744 7.281 -0.450 0.653 -17.545 10.996
50597 -1.9411 19.890 -0.098 0.922 -40.925 37.043
50601 -1.0390 21.445 -0.048 0.961 -43.070 40.992
50602 0.6020 17.052 0.035 0.972 -32.820 34.024
50606 14.7401 23.442 0.629 0.529 -31.206 60.686
50613 -5.4575 3.699 -1.475 0.140 -12.707 1.792
50616 -7.3308 7.482 -0.980 0.327 -21.996 7.334
50619 -4.3782 25.137 -0.174 0.862 -53.646 44.889
50621 -2.6679 20.571 -0.130 0.897 -42.987 37.651
50622 0.0443 18.413 0.002 0.998 -36.045 36.133
50627 0.9066 11.687 0.078 0.938 -22.000 23.813
50628 16.6751 29.438 0.566 0.571 -41.022 74.372
50629 -8.3797 22.051 -0.380 0.704 -51.600 34.840
50630 6.9327 23.142 0.300 0.765 -38.425 52.290
50634 11.7941 83.017 0.142 0.887 -150.917 174.505
50635 8.6252 23.753 0.363 0.717 -37.929 55.180
50636 -2.7311 19.890 -0.137 0.891 -41.715 36.253
50638 0.9179 10.291 0.089 0.929 -19.251 21.087
50641 -152.2637 36.588 -4.162 0.000 -223.976 -80.552
50643 -8.0216 32.229 -0.249 0.803 -71.190 55.147
50644 2.7410 7.204 0.380 0.704 -11.379 16.861
50647 -3.3971 9.862 -0.344 0.730 -22.726 15.932
50648 -3.7996 15.301 -0.248 0.804 -33.788 26.189
50651 2.7037 13.132 0.206 0.837 -23.034 28.442
50658 -1.9036 22.051 -0.086 0.931 -45.124 41.316
50659 -3.0345 12.859 -0.236 0.813 -28.239 22.170
50662 -4.6819 8.624 -0.543 0.587 -21.584 12.220
50665 -2.5488 22.179 -0.115 0.909 -46.019 40.922
50667 -13.2569 17.110 -0.775 0.438 -46.793 20.279
50669 -7.9526 35.466 -0.224 0.823 -77.465 61.560
50674 -5.5210 11.382 -0.485 0.628 -27.829 16.787
50675 -19.0237 15.554 -1.223 0.221 -49.510 11.462
50676 1.4611 13.396 0.109 0.913 -24.794 27.716
50677 -2.5882 5.710 -0.453 0.650 -13.779 8.602
50680 -43.2726 44.422 -0.974 0.330 -130.338 43.793
50682 2.0910 28.025 0.075 0.941 -52.836 57.018
50701 -16.7178 4.837 -3.457 0.001 -26.197 -7.238
50702 -16.4732 4.342 -3.794 0.000 -24.984 -7.962
50703 -72.2323 4.494 -16.072 0.000 -81.041 -63.424
50707 -40.2384 8.129 -4.950 0.000 -56.171 -24.305
50801 0.0630 6.692 0.009 0.992 -13.053 13.179
50830 -3.8187 58.727 -0.065 0.948 -118.922 111.285
50833 3.8418 16.401 0.234 0.815 -28.305 35.988
50841 -7.5872 13.506 -0.562 0.574 -34.058 18.884
50846 -7.7487 25.928 -0.299 0.765 -58.566 43.069
50849 -9.2956 14.792 -0.628 0.530 -38.288 19.697
50851 -8.6087 17.288 -0.498 0.619 -42.492 25.275
50854 6.5800 14.539 0.453 0.651 -21.917 35.077
50864 -12.5812 43.403 -0.290 0.772 -97.650 72.488
51001 -5.9024 24.589 -0.240 0.810 -54.097 42.292
51002 -29.9874 54.378 -0.551 0.581 -136.567 76.592
51004 -0.5980 16.055 -0.037 0.970 -32.066 30.870
51005 -20.6372 61.334 -0.336 0.737 -140.850 99.575
51012 -19.9132 7.629 -2.610 0.009 -34.866 -4.960
51023 -7.1054 15.642 -0.454 0.650 -37.762 23.552
51025 -0.8733 11.617 -0.075 0.940 -23.642 21.896
51027 2.2879 33.963 0.067 0.946 -64.279 68.855
51028 -12.0640 28.847 -0.418 0.676 -68.604 44.476
51031 -2.5559 6.126 -0.417 0.676 -14.562 9.450
51034 -4.1896 10.041 -0.417 0.676 -23.870 15.491
51035 3.1191 23.290 0.134 0.893 -42.528 48.767
51038 -12.7361 64.323 -0.198 0.843 -138.807 113.335
51040 -7.2080 7.499 -0.961 0.336 -21.906 7.490
51041 17.6005 12.021 1.464 0.143 -5.959 41.160
51046 -18.0042 18.128 -0.993 0.321 -53.534 17.526
51050 1.9184 17.923 0.107 0.915 -33.210 37.047
51053 -4.7779 52.538 -0.091 0.928 -107.751 98.195
51054 -4.1676 10.758 -0.387 0.698 -25.252 16.917
51055 3.0083 19.274 0.156 0.876 -34.769 40.785
51058 -19.7716 34.443 -0.574 0.566 -87.278 47.735
51101 -19.7871 8.101 -2.442 0.015 -35.665 -3.909
51103 -9.8981 6.168 -1.605 0.109 -21.988 2.192
51104 -8.7683 5.678 -1.544 0.123 -19.897 2.360
51105 -21.7183 6.601 -3.290 0.001 -34.657 -8.780
51106 10.4248 4.836 2.156 0.031 0.947 19.902
51108 -6.1111 8.038 -0.760 0.447 -21.866 9.643
51109 -6.3463 14.755 -0.430 0.667 -35.267 22.574
51201 5.1171 6.941 0.737 0.461 -8.486 18.721
51237 0.5687 18.712 0.030 0.976 -36.107 37.244
51238 9.6976 39.193 0.247 0.805 -67.119 86.515
51240 2.2007 12.765 0.172 0.863 -22.818 27.219
51241 -5.2087 11.906 -0.437 0.662 -28.544 18.126
51245 -7.7163 19.708 -0.392 0.695 -46.343 30.910
51246 -9.6877 9.569 -1.012 0.311 -28.442 9.066
51247 -7.8392 21.104 -0.371 0.710 -49.202 33.524
51248 8.8231 15.512 0.569 0.569 -21.579 39.225
51249 -3.1822 11.115 -0.286 0.775 -24.968 18.603
51250 18.2911 8.054 2.271 0.023 2.506 34.076
51301 -4.4180 5.241 -0.843 0.399 -14.691 5.855
51331 -5.1645 7.858 -0.657 0.511 -20.566 10.237
51334 -5.0916 7.704 -0.661 0.509 -20.192 10.009
51338 -8.8283 76.864 -0.115 0.909 -159.480 141.823
51342 -2.3078 20.779 -0.111 0.912 -43.035 38.419
51346 7.8156 12.040 0.649 0.516 -15.783 31.414
51347 3.0485 35.466 0.086 0.932 -66.464 72.561
51351 1.5295 7.027 0.218 0.828 -12.243 15.302
51355 0.1004 32.230 0.003 0.998 -63.069 63.269
51358 -4.2979 30.067 -0.143 0.886 -63.228 54.632
51360 0.0190 5.560 0.003 0.997 -10.878 10.916
51401 17.5053 5.815 3.010 0.003 6.108 28.903
51442 0.0771 7.159 0.011 0.991 -13.953 14.108
51443 2.7054 28.025 0.097 0.923 -52.223 57.633
51445 0.6973 11.723 0.059 0.953 -22.279 23.673
51449 -2.8651 17.533 -0.163 0.870 -37.230 31.500
51450 -12.7259 14.792 -0.860 0.390 -41.718 16.266
51453 12.9900 117.379 0.111 0.912 -217.069 243.049
51455 -4.0115 12.765 -0.314 0.753 -29.030 21.007
51461 -9.9745 14.300 -0.698 0.485 -38.002 18.053
51462 -9.7198 40.725 -0.239 0.811 -89.539 70.099
51466 -0.1239 54.378 -0.002 0.998 -106.703 106.455
51501 2.3809 3.837 0.620 0.535 -5.140 9.902
51503 5.6901 4.249 1.339 0.181 -2.638 14.018
51510 -58.8503 16.008 -3.676 0.000 -90.226 -27.475
51521 -2.2436 9.811 -0.229 0.819 -21.472 16.985
51526 -5.0958 39.936 -0.128 0.898 -83.370 73.178
51530 14.2455 117.379 0.121 0.903 -215.814 244.305
51534 -15.9805 11.055 -1.445 0.148 -37.649 5.688
51535 -11.4494 90.934 -0.126 0.900 -189.677 166.779
51537 5.9062 8.267 0.714 0.475 -10.298 22.110
51546 3.5996 20.174 0.178 0.858 -35.940 43.140
51551 -61.7359 37.190 -1.660 0.097 -134.628 11.156
51553 0.5729 67.797 0.008 0.993 -132.308 133.454
51555 -3.2086 8.297 -0.387 0.699 -19.470 13.053
51559 9.6207 25.723 0.374 0.708 -40.796 60.037
51560 -9.7561 12.956 -0.753 0.451 -35.150 15.638
51561 -46.6442 18.561 -2.513 0.012 -83.022 -10.266
51566 -10.4752 7.607 -1.377 0.169 -25.386 4.435
51575 -17.5749 41.561 -0.423 0.672 -99.034 63.884
51577 -12.7977 28.291 -0.452 0.651 -68.247 42.652
51579 5.9828 16.505 0.362 0.717 -26.366 38.332
51601 1.6640 8.211 0.203 0.839 -14.429 17.757
51632 -3.0163 7.951 -0.379 0.704 -18.601 12.568
51640 -36.3935 39.193 -0.929 0.353 -113.211 40.424
52001 -4.2894 3.812 -1.125 0.260 -11.760 3.181
52002 4.0070 6.282 0.638 0.524 -8.305 16.319
52003 -11.8010 7.086 -1.665 0.096 -25.690 2.088
52031 -5.8589 13.395 -0.437 0.662 -32.113 20.396
52033 1.2119 16.771 0.072 0.942 -31.660 34.084
52036 -13.0730 46.694 -0.280 0.780 -104.593 78.447
52037 -19.9528 37.822 -0.528 0.598 -94.084 54.178
52040 4.1247 9.162 0.450 0.653 -13.833 22.082
52042 4.7953 22.051 0.217 0.828 -38.425 48.015
52043 -0.4586 18.636 -0.025 0.980 -36.984 36.067
52046 -20.2794 37.823 -0.536 0.592 -94.411 53.853
52052 -4.1976 11.100 -0.378 0.705 -25.954 17.558
52053 3.9616 26.352 0.150 0.881 -47.688 55.611
52057 4.0086 12.493 0.321 0.748 -20.478 28.495
52060 1.7497 6.197 0.282 0.778 -10.396 13.896
52068 -0.4140 27.270 -0.015 0.988 -53.862 53.034
52076 -2.8712 26.799 -0.107 0.915 -55.396 49.654
52079 -29.2503 28.291 -1.034 0.301 -84.700 26.200
52084 -75.9708 10.855 -6.999 0.000 -97.246 -54.696
52087 -7.0022 10.521 -0.666 0.506 -27.624 13.619
52101 -8.5698 6.475 -1.324 0.186 -21.260 4.120
52136 6.2559 8.938 0.700 0.484 -11.262 23.773
52142 -18.3814 24.589 -0.748 0.455 -66.576 29.813
52144 5.0966 16.250 0.314 0.754 -26.753 36.946
52146 -34.7268 29.438 -1.180 0.238 -92.424 22.970
52151 -3.3330 16.772 -0.199 0.842 -36.205 29.539
52154 31.1452 31.836 0.978 0.328 -31.253 93.543
52159 -10.3523 10.063 -1.029 0.304 -30.076 9.371
52162 -14.9808 30.067 -0.498 0.618 -73.911 43.949
52166 5.3714 27.032 0.199 0.842 -47.611 58.354
52172 -2.7345 7.582 -0.361 0.718 -17.595 12.126
52175 -2.7737 11.333 -0.245 0.807 -24.986 19.439
52205 2.7027 8.758 0.309 0.758 -14.463 19.869
52207 -33.4341 39.936 -0.837 0.402 -111.708 44.840
52208 -9.0768 10.278 -0.883 0.377 -29.221 11.068
52211 -5.4183 14.792 -0.366 0.714 -34.411 23.574
52213 5.0373 15.555 0.324 0.746 -25.450 35.524
52214 -21.0266 17.857 -1.178 0.239 -56.025 13.972
52223 7.3775 61.334 0.120 0.904 -112.835 127.591
52224 -7.3855 33.504 -0.220 0.826 -73.052 58.281
52227 -13.2876 31.092 -0.427 0.669 -74.226 47.651
52228 -20.8334 33.964 -0.613 0.540 -87.402 45.735
52233 -37.8710 10.536 -3.595 0.000 -58.520 -17.221
52240 -6.9281 3.548 -1.953 0.051 -13.881 0.025
52241 18.2449 4.222 4.321 0.000 9.969 26.521
52245 -2.4160 7.422 -0.326 0.745 -16.963 12.131
52246 3.8799 6.956 0.558 0.577 -9.755 17.514
52248 -8.8408 30.739 -0.288 0.774 -69.088 51.406
52253 -1.5028 17.409 -0.086 0.931 -35.624 32.619
52254 -8.4216 35.466 -0.237 0.812 -77.934 61.091
52301 13.9906 10.141 1.380 0.168 -5.885 33.866
52302 -1.7461 4.912 -0.355 0.722 -11.374 7.882
52303 6.3331 10.913 0.580 0.562 -15.055 27.721
52306 -56.6558 36.014 -1.573 0.116 -127.242 13.930
52310 0.9092 6.159 0.148 0.883 -11.162 12.980
52314 -7.1847 6.334 -1.134 0.257 -19.598 5.229
52316 6.2083 36.014 0.172 0.863 -64.378 76.794
52317 -14.1641 6.122 -2.314 0.021 -26.162 -2.166
52324 -40.5398 33.062 -1.226 0.220 -105.341 24.261
52327 4.2951 13.854 0.310 0.757 -22.859 31.449
52328 11.7279 143.749 0.082 0.935 -270.016 293.472
52332 -2.3405 11.465 -0.204 0.838 -24.811 20.130
52333 -4.5720 15.731 -0.291 0.771 -35.404 26.260
52336 -9.5144 39.193 -0.243 0.808 -86.331 67.302
52337 -99.9479 67.798 -1.474 0.140 -232.831 32.935
52338 44.7882 13.262 3.377 0.001 18.795 70.781
52340 -8.7741 17.857 -0.491 0.623 -43.773 26.225
52342 -1.7379 7.709 -0.225 0.822 -16.848 13.372
52345 -15.4194 37.823 -0.408 0.684 -89.551 58.712
52347 4.3374 25.523 0.170 0.865 -45.686 54.361
52349 4.0480 14.367 0.282 0.778 -24.111 32.207
52351 2.9520 37.189 0.079 0.937 -69.938 75.842
52352 -30.0249 45.515 -0.660 0.509 -119.234 59.184
52353 -10.7119 6.090 -1.759 0.079 -22.649 1.225
52356 -6.1172 13.209 -0.463 0.643 -32.007 19.772
52358 9.0290 12.789 0.706 0.480 -16.038 34.096
52361 2.7874 12.765 0.218 0.827 -22.231 27.806
52401 10.6881 6.225 1.717 0.086 -1.513 22.889
52402 -4.6156 3.435 -1.344 0.179 -11.348 2.117
52403 -9.0701 6.396 -1.418 0.156 -21.607 3.466
52404 -14.8756 3.948 -3.768 0.000 -22.613 -7.138
52405 -3.3481 4.202 -0.797 0.426 -11.584 4.888
52411 4.8312 6.146 0.786 0.432 -7.214 16.877
52501 4.3080 4.328 0.995 0.320 -4.175 12.791
52531 0.2523 11.270 0.022 0.982 -21.836 22.341
52537 -1.2295 14.470 -0.085 0.932 -29.590 27.131
52544 -10.4153 6.950 -1.499 0.134 -24.037 3.207
52553 -11.5573 36.013 -0.321 0.748 -82.142 59.028
52554 -21.6532 32.638 -0.663 0.507 -85.622 42.316
52556 -10.7956 7.552 -1.429 0.153 -25.598 4.006
52565 5.2998 14.136 0.375 0.708 -22.405 33.005
52571 -5.4152 24.950 -0.217 0.828 -54.317 43.486
52577 -10.5555 5.756 -1.834 0.067 -21.837 0.726
52591 1.0685 12.001 0.089 0.929 -22.454 24.591
52601 -11.7506 4.454 -2.638 0.008 -20.480 -3.021
52623 -29.5609 52.538 -0.563 0.574 -132.535 73.413
52625 -2.9976 49.358 -0.061 0.952 -99.737 93.742
52626 -1.7682 36.014 -0.049 0.961 -72.354 68.817
52627 -9.2686 5.976 -1.551 0.121 -20.982 2.445
52632 6.9464 5.483 1.267 0.205 -3.799 17.692
52637 -5.7046 11.925 -0.478 0.632 -29.077 17.668
52639 -28.4896 33.063 -0.862 0.389 -93.292 36.313
52641 2.6482 6.588 0.402 0.688 -10.264 15.561
52653 -0.1926 14.793 -0.013 0.990 -29.186 28.801
52655 -12.7123 7.811 -1.628 0.104 -28.021 2.596
52656 4.4909 15.554 0.289 0.773 -25.995 34.977
52722 3.2800 4.125 0.795 0.426 -4.804 11.364
52726 9.9088 10.153 0.976 0.329 -9.991 29.809
52728 -30.7783 33.964 -0.906 0.365 -97.346 35.790
52730 -23.3074 17.409 -1.339 0.181 -57.429 10.814
52732 -5.5050 4.526 -1.216 0.224 -14.376 3.366
52733 -22.7024 12.957 -1.752 0.080 -48.097 2.693
52738 -70.2835 12.179 -5.771 0.000 -94.154 -46.413
52742 9.6636 14.540 0.665 0.506 -18.833 38.161
52747 3.3977 13.158 0.258 0.796 -22.392 29.187
52748 -12.7320 8.096 -1.573 0.116 -28.600 3.136
52751 -2.0188 25.523 -0.079 0.937 -52.043 48.005
52753 6.5331 8.703 0.751 0.453 -10.525 23.591
52761 -3.0989 4.247 -0.730 0.466 -11.422 5.224
52768 -4.0831 32.638 -0.125 0.900 -68.054 59.887
52772 2.2827 9.380 0.243 0.808 -16.102 20.668
52776 -4.9321 12.559 -0.393 0.695 -29.548 19.684
52777 -2.4986 39.936 -0.063 0.950 -80.773 75.776
52778 6.8460 11.724 0.584 0.559 -16.133 29.825
52801 -23.7006 143.749 -0.165 0.869 -305.444 258.043
52802 -60.1258 5.848 -10.282 0.000 -71.587 -48.665
52803 -111.9066 7.277 -15.378 0.000 -126.169 -97.644
52804 -27.7476 4.444 -6.244 0.000 -36.457 -19.038
52806 -13.1925 4.913 -2.685 0.007 -22.823 -3.562
52807 7.5408 4.189 1.800 0.072 -0.670 15.751
56201 -7.0070 42.452 -0.165 0.869 -90.212 76.198
712-2 -4.7017 15.138 -0.311 0.756 -34.372 24.969
Omnibus: 367270.389 Durbin-Watson: 2.000
Prob(Omnibus): 0.000 Jarque-Bera (JB): 67275238939.284
Skew: 5.961 Prob(JB): 0.00
Kurtosis: 2451.746 Cond. No. 9.85e+03
In [42]:
#County Scatterplot
plt.scatter(predictions_county, y, s=30, c='r', marker='+', zorder=10)
plt.xlabel("Variables")
plt.ylabel("Sale Dollars")
plt.plot(predictions_county, np.poly1d(np.polyfit(predictions_county, y, 1))(predictions_county))
plt.show()
print("MSE:", model_county.mse_model) ## mean squared error
MSE: 325336187.622
In [43]:
#City Scatterplot
plt.scatter(predictions_city, y, s=30, c='r', marker='+', zorder=10)
plt.xlabel("Variables")
plt.ylabel("Sale Dollars")
plt.plot(predictions_city, np.poly1d(np.polyfit(predictions_city, y, 1))(predictions_city))
plt.show()
print("MSE:", model_city.mse_model) ## mean squared error
MSE: 85316215.6492
In [44]:
#Zip Code Scatterplot
plt.scatter(predictions_zip, y, s=30, c='r', marker='+', zorder=10)
plt.xlabel("Variables")
plt.ylabel("Sale Dollars")
plt.plot(predictions_zip, np.poly1d(np.polyfit(predictions_zip, y, 1))(predictions_zip))
plt.show()
print("MSE:", model_zip.mse_model) ## mean squared error
MSE: 79648697.4319

In the graph above bottles retail and bottles sold are still in a linear configuration.

In [45]:
# Split my data into training and test.
x_train_county, x_test_county, y_train_county, y_test_county = train_test_split(X_county, y, test_size = 0.3, random_state = 1234)
x_train_county = sm.add_constant(x_train_county)
x_test_county = sm.add_constant(x_test_county)
In [46]:
# Split my data into training and test.
x_train_city, x_test_city, y_train_city, y_test_city = train_test_split(X_city, y, test_size = 0.3, random_state = 1234)
x_train_city = sm.add_constant(x_train_city)
x_test_city = sm.add_constant(x_test_city)
In [47]:
# Split my data into training and test.
x_train_zip, x_test_zip, y_train_zip, y_test_zip = train_test_split(X_zip, y, test_size = 0.3, random_state = 1234)
x_train_zip = sm.add_constant(x_train_zip)
x_test_zip = sm.add_constant(x_test_zip)
In [71]:
print(x_train.shape)
print(x_test.shape)
print(X.shape)
print(y_train.shape)
print(y_test.shape)
print(y.shape)
(1884821, 101)
(807781, 101)
(2692602, 100)
(1884821,)
(807781,)
(2692602,)
In [48]:
#County 
lm = linear_model.LinearRegression()
model_lin_county = lm.fit(x_train_county, y_train_county)
predictions_county = lm.predict(x_test_county)
In [49]:
#City 
lm = linear_model.LinearRegression()
model_lin_city = lm.fit(x_train_city, y_train_city)
predictions_city = lm.predict(x_test_city)
In [50]:
#Zip Code
lm = linear_model.LinearRegression()
model_lin_zip = lm.fit(x_train_zip, y_train_zip)
predictions_zip = lm.predict(x_test_zip)

Creating a kfolds cross validation test

In [51]:
#County
scores_county = cross_val_score(model_lin_county, x_train_county, y_train_county, cv=5)
print("Cross validated scores:", scores_county)
print("Average: ", scores_county.mean())
Cross validated scores: [ 0.75449393  0.74423201  0.72962293  0.71827552  0.72573546]
Average:  0.734471967462
In [52]:
#City
scores_city = cross_val_score(model_lin_city, x_train_city, y_train_city, cv=5)
print("Cross validated scores:", scores_city)
print("Average: ", scores_city.mean())
Cross validated scores: [  7.54656701e-01   7.44332737e-01   7.30246737e-01   7.18273801e-01
  -8.12869612e+05]
Average:  -162573.332904
In [53]:
#Zip Codes
scores_zip = cross_val_score(model_lin_zip, x_train_zip, y_train_zip, cv=5)
print("Cross validated scores:", scores_zip)
print("Average: ", scores_zip.mean())
Cross validated scores: [ 0.75582441  0.74495646  0.73070775  0.71923674  0.72683249]
Average:  0.735511569804

Plot your results

Again make sure that you record any valuable information. For example, in the tax scenario, did you find the sales from the first three months of the year to be a good predictor of the total sales for the year? Plot the predictions versus the true values and discuss the successes and limitations of your models

Testing Model:

In [54]:
#County
predictions_county = cross_val_predict(model_lin_county, X_county, y, cv=5)
plt.scatter(y, predictions_county)
plt.xlabel('Actual')
plt.ylabel('Predicted')
accuracy_county = metrics.r2_score(y, predictions_county)
print("Cross Predicted Accuracy:", accuracy_county)
Cross Predicted Accuracy: 0.715240830688
In [55]:
#City
predictions_city = cross_val_predict(model_lin_city, X_city, y, cv=5)
plt.scatter(y, predictions_city)
plt.xlabel('Actual')
plt.ylabel('Predicted')
accuracy_city = metrics.r2_score(y, predictions_city)
print("Cross Predicted Accuracy:", accuracy_city)
Cross Predicted Accuracy: -2.77554534782e+13
In [56]:
#Zip Code
predictions_zip = cross_val_predict(model_lin_zip, X_zip, y, cv=5)
plt.scatter(y, predictions_zip)
plt.xlabel('Actual')
plt.ylabel('Predicted')
accuracy_zip = metrics.r2_score(y, predictions_zip)
print("Cross Predicted Accuracy:", accuracy_zip)
Cross Predicted Accuracy: 0.716408801447
In [ ]:
 

Present the Results

Present your conclusions and results. If you have more than one interesting model feel free to include more than one along with a discussion. Use your work in this notebook to prepare your write-up.

This report provides an analysis and evaluation of the 3 best location within Iowa to build a liquor store. The methods of analysis were putting together linear regression models and running them through a K folds cross validation test. The specific things we were interested in measuring was sale price, number of bottles sold, and bottle retail price. We then separated all variables by County, City, and Zip Code. After running the initial tests we decided to drop city and continue the test via Zip Code and County.

The top 5 counties with the best sales/ bottle ratio for their size were Dallas County, Carroll County, Sioux County, Iowa County, Howard County. The top five zip codes were 50266, 52338, 50320, 52154, 50021. Since zip code 52388 has a winery that is responsible for a majority of the sale we will be dropping it from this list which leaves us with the top three zip codes 50266, 50320, 52154. My first suggestion for a new liquor store would be in 50266.

Additional information that would be needed to perform a more detailed analysis and provide a more accurate location within the zip code would be demographics and population data. With the additional information we should be able to narrow down the location to within a few blocks.

In [ ]: