Marco Tavora
The primary goal of this project is to contribute to the expansion plans of a liquor store owner in Iowa, by investigating the market data for potential new locations.
Furthermore, he is interested in understanding details of the best model that we can fit to the data so that her team can optimally evaluate possible sites for a new storefront.
More concretely, we will predict total sales by county. Note that this is a cross-sectional analysis and no temporal behavior is considered.
Expansion plans traditionally use subsets of the following mix of data:
Demographic information can be introduced in a high level of detail (see for example Ref. [1] for an interesting analysis and useful public datasets). Here we will not delve so deeply into that, but some simple demographical data will be used (namely information about income and population).
We will be interested in several ratios.
i) The first one is the ratio $r_{{\rm{sv}}}^{(c)}$ between sales and volume for each county i.e. the amount of dollars per liter sold. If $r_{{\rm{sv}}}^{(c)}$ is high in county $(c)$, the stores in that county are, on average, high-end stores. The ratio is given by:
\begin{eqnarray} r_{{\text{sv}}}^{(c)} = \frac{{{\text{sales in }}c}}{{{\text{volume consumed in }}c}}\,\,\,\,\,\,(1)\nonumber \end{eqnarray}(where the upper index $(c)$ is for county).
ii) Another relevant ratio is the number of stores per area:
\begin{eqnarray} r_{\text{sa}}^{(c)}=\frac{{\text{number of stores in }}c}{{\text{area of }}c}\,\,\,\,\,(2)\nonumber \end{eqnarray}The meaning of a high value of $r_{\rm{sa}}^{(c)}$ is not so straightforward since it may indicate either
In contrast, a low value of $r_{\rm{sa}}^{(c)}$ may indicate a market with untapped potential or a market with a population not very interested in this type of store (for example, a county with highly religious population).
iii) Another important ratio is consumption/population i.e. the consumption per capita:
\begin{eqnarray} r_{\text{cp}}^{(c)}=\frac{{\text{number of liters consumed in }}c}{{\text{population of }}c}\,\,\,\,\,(3)\nonumber \end{eqnarray}The knowledge of the profile of the population in the county (if they are "light" or "heavy" drinkers) would certainly help the owner decide whether to open or not a new storefront there (see Ref. [1,2,3]).
Competition is a critical component, and can be indirectly measured by the ratio of the number of stores and the population in a given county $c$:
\begin{eqnarray} r_{\text{sp}}^{(c)}=\frac{{\text{number of stores in }}c}{{\text{population of }}c}\,\,\,\,\,(4)\nonumber \end{eqnarray}For this information to be useful we would need more granular data (such as apps check-ins as discussed in [7]). Population and population density will be used as proxies.
There is an issue with the meaning of the data that is discussed in the appendix (see the end of the notebook).
Let us take a look at the data. The dataset contains the spirits purchase information of Iowa Class “E” liquor licensees by product and date of purchase.
import pandas as pd
pd.set_option('display.max_columns', None) # Used to display all columns
df_raw = pd.read_csv('iowa_liquor_sales_proj_2.csv')
df_raw.head()
/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py:2717: DtypeWarning: Columns (6) have mixed types. Specify dtype option on import or set low_memory=False. interactivity=interactivity, compiler=compiler, result=result)
Invoice/Item Number | Date | Store Number | Store Name | Address | City | Zip Code | Store Location | County Number | County | Category | Category Name | Vendor Number | Vendor Name | Item Number | Item Description | Pack | Bottle Volume (ml) | State Bottle Cost | State Bottle Retail | Bottles Sold | Sale (Dollars) | Volume Sold (Liters) | Volume Sold (Gallons) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | S29198800001 | 11/20/2015 | 2191 | Keokuk Spirits | 1013 MAIN | KEOKUK | 52632 | 1013 MAIN\nKEOKUK 52632\n(40.39978, -91.387531) | 56.0 | Lee | NaN | NaN | 255 | Wilson Daniels Ltd. | 297 | Templeton Rye w/Flask | 6 | 750 | $18.09 | $27.14 | 6 | $162.84 | 4.50 | 1.19 |
1 | S29195400002 | 11/21/2015 | 2205 | Ding's Honk And Holler | 900 E WASHINGTON | CLARINDA | 51632 | 900 E WASHINGTON\nCLARINDA 51632\n(40.739238, ... | 73.0 | Page | NaN | NaN | 255 | Wilson Daniels Ltd. | 297 | Templeton Rye w/Flask | 6 | 750 | $18.09 | $27.14 | 12 | $325.68 | 9.00 | 2.38 |
2 | S29050300001 | 11/16/2015 | 3549 | Quicker Liquor Store | 1414 48TH ST | FORT MADISON | 52627 | 1414 48TH ST\nFORT MADISON 52627\n(40.624226, ... | 56.0 | Lee | NaN | NaN | 130 | Disaronno International LLC | 249 | Disaronno Amaretto Cavalli Mignon 3-50ml Pack | 20 | 150 | $6.40 | $9.60 | 2 | $19.20 | 0.30 | 0.08 |
3 | S28867700001 | 11/04/2015 | 2513 | Hy-Vee Food Store #2 / Iowa City | 812 S 1ST AVE | IOWA CITY | 52240 | 812 S 1ST AVE\nIOWA CITY 52240\n | 52.0 | Johnson | NaN | NaN | 65 | Jim Beam Brands | 237 | Knob Creek w/ Crystal Decanter | 3 | 1750 | $35.55 | $53.34 | 3 | $160.02 | 5.25 | 1.39 |
4 | S29050800001 | 11/17/2015 | 3942 | Twin Town Liquor | 104 HIGHWAY 30 WEST | TOLEDO | 52342 | 104 HIGHWAY 30 WEST\nTOLEDO 52342\n(41.985887,... | 86.0 | Tama | NaN | NaN | 130 | Disaronno International LLC | 249 | Disaronno Amaretto Cavalli Mignon 3-50ml Pack | 20 | 150 | $6.40 | $9.60 | 2 | $19.20 | 0.30 | 0.08 |
from __future__ import division
import datetime
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
from sklearn import linear_model
import seaborn as sns
%matplotlib inline
import math
import itertools
It is often convenient to eliminate spaces, commas, etc:
df = df_raw.copy()
df.columns = [c.replace('/','_').replace(' ','_').replace(')','').replace('(','').lower() for c in df.columns.tolist()]
Let us see how many years and months for each year we have.
datatime
..year
we see that there is data for only two years, namely, 2015 and 2016..month
we see that the data includes 12 months of 2015 but only 3 months of 2016.For simplicity we will work only with 2015 data (this restriction can be easily generalized if need be).
df["date"] = pd.to_datetime(df["date"], format="%m/%d/%Y")
print "The years in the data are:", df['date'].dt.year.unique()
The years in the data are: [2015 2016]
for yr in [2015,2016]:
print "The months in year {}".format(yr),"are", sorted(df[df['date'].dt.year == yr]['date'].dt.month.unique())
The months in year 2015 are [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] The months in year 2016 are [1, 2, 3]
df = df[(df['date'] < '2016-01-01')]
df.shape
(2184483, 24)
print "The fraction of 2016 data is:", round(df.shape[0]/df_raw.shape[0],2)
The fraction of 2016 data is: 0.81
This is a rather large dataset and we can eliminate rows and columns that will not be used.
print "The number of rows and number of columns are:"
print ""
print df.shape[0],"and",df.shape[1]
The number of rows and number of columns are: 2184483 and 24
We will, for now, keep the following columns for the following reasons:
['store_number','county','bottle_volume_ml', 'state_bottle_cost', 'state_bottle_retail',\
'bottles_sold', 'sale_dollars', 'volume_sold_liters']
The date will be dropped since we will not perform any time series analysis.
cols_to_keep = ['store_number','county','bottle_volume_ml', 'state_bottle_cost', 'state_bottle_retail',\
'bottles_sold', 'sale_dollars', 'volume_sold_liters']
df = df[cols_to_keep]
NaNs
and converting objects to floats¶We will:
NaN
values.Two ways of excluding dollar signs are:
1) [x[1:] for x in df['state_bottle_cost']]
2) df['state_bottle_cost'].apply(lambda x: x.strip('$'))
We will go with option 2.
cols_with_dollar = ['state_bottle_cost','state_bottle_retail','sale_dollars']
for col in cols_with_dollar:
df[col] = df[col].apply(lambda x: x.strip('$')).astype('float')
df.dtypes
store_number int64 county object bottle_volume_ml int64 state_bottle_cost float64 state_bottle_retail float64 bottles_sold int64 sale_dollars float64 volume_sold_liters float64 dtype: object
The commands in the cells below check for:
NaN
valuesNaN
values there areWe find that only a tiny fraction of the entries in the county
column is null, which is not problematic. We will drop them.
null = df.isnull().any() # Column of boolens determining presence of nulls
print "Which colune has null values?",null[null == True].index[0]
print "Number of null values in", null[null == True].index[0], "is:",df.isnull().sum().loc['county']
print "Frequency of null values:", str(100*round(100 * df.isnull().sum().loc['county']/df.shape[0],5)) +'%'
Which colune has null values? county Number of null values in county is: 1119 Frequency of null values: 5.122%
df.dropna(inplace=True)
We convert store_numbers
to strings:
df[['store_number']] = df[['store_number']].astype(str)
Our DataFrame
is now:
df.head()
store_number | county | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | |
---|---|---|---|---|---|---|---|---|
0 | 2191 | Lee | 750 | 18.09 | 27.14 | 6 | 162.84 | 4.50 |
1 | 2205 | Page | 750 | 18.09 | 27.14 | 12 | 325.68 | 9.00 |
2 | 3549 | Lee | 150 | 6.40 | 9.60 | 2 | 19.20 | 0.30 |
3 | 2513 | Johnson | 1750 | 35.55 | 53.34 | 3 | 160.02 | 5.25 |
4 | 3942 | Tama | 150 | 6.40 | 9.60 | 2 | 19.20 | 0.30 |
We use .describe( )
below to better understand the data.
# the 'count' row is not very useful and the store_number column is meaningless in this table
df.describe().iloc[1:,1:]
state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | |
---|---|---|---|---|---|
mean | 9.816396 | 14.742246 | 9.877354 | 130.182613 | 8.981275 |
std | 14.626429 | 21.939393 | 23.699125 | 405.568937 | 28.355651 |
min | 0.890000 | 1.340000 | 1.000000 | 1.340000 | 0.000000 |
25% | 5.540000 | 8.310000 | 2.000000 | 30.720000 | 1.600000 |
50% | 8.180000 | 12.300000 | 6.000000 | 70.560000 | 5.250000 |
75% | 11.960000 | 17.940000 | 12.000000 | 135.360000 | 10.500000 |
max | 6100.000000 | 9150.000000 | 3960.000000 | 106326.000000 | 3960.000000 |
Let us look at histograms. We will use the following function (adapted from [1]):
def draw_histograms(df,col,bins):
df[col].hist(bins=bins);
plt.title(col);
plt.xlabel(col);
plt.xticks(rotation=90);
plt.show();
print
cols = ['bottles_sold', 'sale_dollars', 'volume_sold_liters']
for col in cols:
draw_histograms(df, col,bins=10)
There are some obvious problems:
bottle_volume_ml
and volume_sold_liters
there are zero values which can be droppedKeeping outliers in our analysis will inflate the predicted sales. Also, we intend to predict the most likely performance for each store. In other words, we do not want to use exceptionally well-performing stores to make recommendations.
df1 = df.copy()
print np.mean(df1['bottles_sold'])+ 2*np.std(df1['bottles_sold'])
print np.mean(df1['sale_dollars'])+ 2*np.std(df1['sale_dollars'])
print np.mean(df1['volume_sold_liters'])+ 2*np.std(df1['volume_sold_liters'])
57.2755933042 941.320301085 65.6925625834
Let us correct these:
df1 = df.copy()
cutoffs = (df1['bottle_volume_ml'] > 0) & (df1['bottles_sold'] < 50) & (df1['volume_sold_liters'] < 60) & (df1['sale_dollars'] < 900)
df1 = df1[cutoffs]
df1.describe().iloc[1:,1:]
state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | |
---|---|---|---|---|---|
mean | 9.745472 | 14.635677 | 7.968786 | 99.364086 | 6.864374 |
std | 6.995164 | 10.492340 | 7.830878 | 100.784662 | 6.635848 |
min | 0.890000 | 1.340000 | 1.000000 | 1.340000 | 0.100000 |
25% | 5.510000 | 8.270000 | 2.000000 | 30.000000 | 1.500000 |
50% | 8.000000 | 12.000000 | 6.000000 | 68.640000 | 4.800000 |
75% | 11.830000 | 17.750000 | 12.000000 | 132.780000 | 10.500000 |
max | 498.640000 | 747.960000 | 48.000000 | 899.980000 | 54.000000 |
Let us look again at histograms.
cols = ['bottles_sold', 'sale_dollars', 'volume_sold_liters']
for col in cols:
draw_histograms(df1, col,bins=10)
Now you are ready to compute the variables you will use for your regression from the data. For example, you may want to compute total sales per store from Jan to March of 2015, mean price per bottle, etc. Refer to the readme for more ideas appropriate to your scenario.
Pandas is your friend for this task. Take a look at the operations here for ideas on how to make the best use of pandas and feel free to search for blog and Stack Overflow posts to help you group data by certain variables and compute sums, means, etc. You may find it useful to create a new data frame to house this summary data.
To obtain the ratios (1)-(4) we need the counties' areas, their populations and the number of stores in them. Let us count the number of stores per county:
df2 = df1.copy()
stores_per_county = df2.groupby(['county'])['store_number'].count()
print stores_per_county.sum()
stores_per_county.head()
2143329
county Adair 4420 Adams 1797 Allamakee 8595 Appanoose 8582 Audubon 2045 Name: store_number, dtype: int64
df1.columns
Index([u'store_number', u'county', u'bottle_volume_ml', u'state_bottle_cost', u'state_bottle_retail', u'bottles_sold', u'sale_dollars', u'volume_sold_liters'], dtype='object')
Let us computer total sales per county:
Let us first include the profit:
df2['profit'] = (df2['state_bottle_retail'] - df2['state_bottle_cost'])*df2["bottles_sold"]
df2.head()
store_number | county | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | |
---|---|---|---|---|---|---|---|---|---|
0 | 2191 | Lee | 750 | 18.09 | 27.14 | 6 | 162.84 | 4.50 | 54.30 |
1 | 2205 | Page | 750 | 18.09 | 27.14 | 12 | 325.68 | 9.00 | 108.60 |
2 | 3549 | Lee | 150 | 6.40 | 9.60 | 2 | 19.20 | 0.30 | 6.40 |
3 | 2513 | Johnson | 1750 | 35.55 | 53.34 | 3 | 160.02 | 5.25 | 53.37 |
4 | 3942 | Tama | 150 | 6.40 | 9.60 | 2 | 19.20 | 0.30 | 6.40 |
cols = ['county', 'bottle_volume_ml', 'state_bottle_cost', \
'state_bottle_retail', 'bottles_sold', 'sale_dollars', 'volume_sold_liters', 'profit']
df_county = df2[cols].groupby(['county']).sum()
df_county.head()
bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | |
---|---|---|---|---|---|---|---|
county | |||||||
Adair | 4455875 | 40182.62 | 60373.69 | 33807 | 410805.62 | 32522.35 | 137466.12 |
Adams | 1757403 | 18103.82 | 27171.39 | 8446 | 100596.80 | 7547.62 | 33614.78 |
Allamakee | 9079175 | 85371.49 | 128238.38 | 57833 | 772843.46 | 61269.76 | 258630.44 |
Appanoose | 8360175 | 82458.44 | 123839.97 | 58261 | 711376.15 | 53184.66 | 237792.71 |
Audubon | 2043175 | 17737.11 | 26656.07 | 13978 | 159547.63 | 13215.43 | 53393.70 |
df_county['num_stores'] = stores_per_county
df_county.reset_index(level=0, inplace=True)
df_county.head()
county | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | num_stores | |
---|---|---|---|---|---|---|---|---|---|
0 | Adair | 4455875 | 40182.62 | 60373.69 | 33807 | 410805.62 | 32522.35 | 137466.12 | 4420 |
1 | Adams | 1757403 | 18103.82 | 27171.39 | 8446 | 100596.80 | 7547.62 | 33614.78 | 1797 |
2 | Allamakee | 9079175 | 85371.49 | 128238.38 | 57833 | 772843.46 | 61269.76 | 258630.44 | 8595 |
3 | Appanoose | 8360175 | 82458.44 | 123839.97 | 58261 | 711376.15 | 53184.66 | 237792.71 | 8582 |
4 | Audubon | 2043175 | 17737.11 | 26656.07 | 13978 | 159547.63 | 13215.43 | 53393.70 | 2045 |
df_county.isnull().any()
county False bottle_volume_ml False state_bottle_cost False state_bottle_retail False bottles_sold False sale_dollars False volume_sold_liters False profit False num_stores False dtype: bool
df_county['average_store_profit'] = df_county['profit']/df_county['num_stores']
df_county.head()
county | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | num_stores | average_store_profit | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Adair | 4455875 | 40182.62 | 60373.69 | 33807 | 410805.62 | 32522.35 | 137466.12 | 4420 | 31.100932 |
1 | Adams | 1757403 | 18103.82 | 27171.39 | 8446 | 100596.80 | 7547.62 | 33614.78 | 1797 | 18.706055 |
2 | Allamakee | 9079175 | 85371.49 | 128238.38 | 57833 | 772843.46 | 61269.76 | 258630.44 | 8595 | 30.090802 |
3 | Appanoose | 8360175 | 82458.44 | 123839.97 | 58261 | 711376.15 | 53184.66 | 237792.71 | 8582 | 27.708309 |
4 | Audubon | 2043175 | 17737.11 | 26656.07 | 13978 | 159547.63 | 13215.43 | 53393.70 | 2045 | 26.109389 |
df_county.isnull().any()
county False bottle_volume_ml False state_bottle_cost False state_bottle_retail False bottles_sold False sale_dollars False volume_sold_liters False profit False num_stores False average_store_profit False dtype: bool
df_county['sales_per_litters'] = df_county['sale_dollars']/df_county['volume_sold_liters']
df_county.describe().loc[['mean','std','max']]
bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | num_stores | average_store_profit | sales_per_litters | |
---|---|---|---|---|---|---|---|---|---|---|
mean | 1.994879e+07 | 2.109874e+05 | 3.168593e+05 | 1.725225e+05 | 2.151211e+06 | 1.486122e+05 | 7.188647e+05 | 21649.787879 | 30.170016 | 13.382942 |
std | 4.068975e+07 | 4.620609e+05 | 6.938726e+05 | 4.054011e+05 | 4.974648e+06 | 3.237530e+05 | 1.662030e+06 | 46543.944487 | 6.068482 | 0.999601 |
max | 3.376040e+08 | 3.868235e+06 | 5.808917e+06 | 3.319741e+06 | 4.132447e+07 | 2.673001e+06 | 1.380706e+07 | 383547.000000 | 44.859888 | 15.770199 |
Consider histograms again:
print df_county.columns.tolist()
['county', 'bottle_volume_ml', 'state_bottle_cost', 'state_bottle_retail', 'bottles_sold', 'sale_dollars', 'volume_sold_liters', 'profit', 'num_stores', 'average_store_profit', 'sales_per_litters']
cols = ['bottles_sold', 'sale_dollars', 'volume_sold_liters']
for col in cols:
draw_histograms(df_county, col,bins=20)
There are two clear outliers which we can remove:
print df_county['bottles_sold'].sort_values(ascending=False).head()
print ""
print df_county.iloc[76].loc['county']
76 3319741 56 1500058 81 1237025 6 1093412 51 927749 Name: bottles_sold, dtype: int64 Polk
print df_county['bottles_sold'].sort_values(ascending=True).head()
print ""
print df_county.iloc[35].loc['county']
35 2512 25 7746 92 8283 1 8446 86 9360 Name: bottles_sold, dtype: int64 Fremont
conditions = (df_county['county'] != 'Polk') & (df_county['county'] != 'Fremont')
df_county = df_county[conditions]
cols = ['bottles_sold', 'sale_dollars', 'volume_sold_liters']
for col in cols:
draw_histograms(df_county, col,bins=20)
df_county.head()
county | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | num_stores | average_store_profit | sales_per_litters | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Adair | 4455875 | 40182.62 | 60373.69 | 33807 | 410805.62 | 32522.35 | 137466.12 | 4420 | 31.100932 | 12.631486 |
1 | Adams | 1757403 | 18103.82 | 27171.39 | 8446 | 100596.80 | 7547.62 | 33614.78 | 1797 | 18.706055 | 13.328281 |
2 | Allamakee | 9079175 | 85371.49 | 128238.38 | 57833 | 772843.46 | 61269.76 | 258630.44 | 8595 | 30.090802 | 12.613783 |
3 | Appanoose | 8360175 | 82458.44 | 123839.97 | 58261 | 711376.15 | 53184.66 | 237792.71 | 8582 | 27.708309 | 13.375589 |
4 | Audubon | 2043175 | 17737.11 | 26656.07 | 13978 | 159547.63 | 13215.43 | 53393.70 | 2045 | 26.109389 | 12.072829 |
To obtain both $r_{\rm{sp}}^{(c)}$ (stores per person) and $r_{\rm{cp}}^{(c)}$ (alcohol consumption per person) we need data about the population of each county. This dataset is taken from [5]. These are 2016 population estimates (which will be discussed later).
pop = pd.read_csv('pop_iowa_per_county.csv')
del pop['Unnamed: 0']
pop = pop[pop['county'] != 'Polk']
pop = pop[pop['county'] != 'Fremont']
pop.head()
county | population | |
---|---|---|
0 | Adair | 7092 |
1 | Adams | 3693 |
2 | Allamakee | 13884 |
3 | Appanoose | 12462 |
4 | Audubon | 5678 |
pop.isnull().any()
county False population False dtype: bool
We now append to our dataset columns of population, population per store and consumption per store:
df_new = df_county.copy()
df_new = pd.merge(df_new, pop, on= 'county', how='outer')
df_new.isnull().any()
county False bottle_volume_ml False state_bottle_cost False state_bottle_retail False bottles_sold False sale_dollars False volume_sold_liters False profit False num_stores False average_store_profit False sales_per_litters False population False dtype: bool
df_new['store_population_ratio'] = df_new['num_stores']/df_new['population']
df_new['consumption_per_capita'] = df_new['volume_sold_liters']/df_new['population']
df_new.isnull().any()
county False bottle_volume_ml False state_bottle_cost False state_bottle_retail False bottles_sold False sale_dollars False volume_sold_liters False profit False num_stores False average_store_profit False sales_per_litters False population False store_population_ratio False consumption_per_capita False dtype: bool
It is a good idea to export this DataFrame to keep after so many changes:
df_new.to_csv('df_new.csv')
We not import a table containing among other things, areas per county (source [1]):
pd.set_option('display.max_rows', None)
areas = pd.read_csv('ia_zip_city_county_sqkm.csv')
del areas['Unnamed: 0']
areas.columns = ['zip_code', 'city', 'county', 'state', 'county_number', 'area']
areas = areas[['county','area']].groupby(['county'])[['area']].sum()
areas.reset_index(level=0, inplace=True)
areas = areas[(areas['county'] != 'Polk') & (areas['county'] != 'Fremont')]
areas
county | area | |
---|---|---|
0 | Adair | 1146.149874 |
1 | Adams | 950.420482 |
2 | Allamakee | 1640.663925 |
3 | Appanoose | 1439.625361 |
4 | Audubon | 1011.305224 |
5 | Benton | 1589.039934 |
6 | Black Hawk | 1658.463133 |
7 | Boone | 1172.618576 |
8 | Bremer | 1254.612988 |
9 | Buchanan | 1437.306002 |
10 | Buena Vista | 1641.768588 |
11 | Butler | 1714.432829 |
12 | Calhoun | 1593.242060 |
13 | Carroll | 1790.407255 |
14 | Cass | 1776.544213 |
15 | Cedar | 1232.378198 |
16 | Cerro Gordo | 1497.770139 |
17 | Cherokee | 1404.657609 |
18 | Chickasaw | 1332.800885 |
19 | Clarke | 954.543677 |
20 | Clay | 1305.389452 |
21 | Clayton | 2086.377422 |
22 | Clinton | 1701.089895 |
23 | Crawford | 1794.525720 |
24 | Dallas | 1727.389531 |
25 | Davis | 1202.497109 |
26 | Decatur | 1443.427332 |
27 | Delaware | 1275.461107 |
28 | Des Moines | 928.456413 |
29 | Dickinson | 1024.114722 |
30 | Dubuque | 1445.975306 |
31 | Emmet | 1130.769046 |
32 | Fayette | 1955.698767 |
33 | Floyd | 1245.579774 |
34 | Franklin | 1346.541996 |
36 | Greene | 1385.511780 |
37 | Grundy | 1097.912646 |
38 | Guthrie | 1659.326695 |
39 | Hamilton | 1425.376163 |
40 | Hancock | 1435.986414 |
41 | Hardin | 1749.403531 |
42 | Harrison | 2000.716609 |
43 | Henry | 1374.637080 |
44 | Howard | 1553.199575 |
45 | Humboldt | 1325.822767 |
46 | Ida | 1061.883800 |
47 | Iowa | 1448.032410 |
48 | Jackson | 2075.514783 |
49 | Jasper | 1677.247032 |
50 | Jefferson | 1228.460352 |
51 | Johnson | 1374.398467 |
52 | Jones | 1603.902991 |
53 | Keokuk | 1568.990538 |
54 | Kossuth | 2171.379858 |
55 | Lee | 1207.589119 |
56 | Linn | 2164.866374 |
57 | Louisa | 980.907015 |
58 | Lucas | 1154.264969 |
59 | Lyon | 1520.471701 |
60 | Madison | 1409.747269 |
61 | Mahaska | 1216.496254 |
62 | Marion | 1528.879832 |
63 | Marshall | 1520.107767 |
64 | Mills | 1220.953340 |
65 | Mitchell | 1086.794650 |
66 | Monona | 1656.794562 |
67 | Monroe | 960.814655 |
68 | Montgomery | 1140.359482 |
69 | Muscatine | 1700.941831 |
70 | O'Brien | 1620.431744 |
71 | Osceola | 967.452680 |
72 | Page | 1398.118634 |
73 | Palo Alto | 1596.128377 |
74 | Plymouth | 2259.953641 |
75 | Pocahontas | 1297.431413 |
77 | Pottawattamie | 2303.225231 |
78 | Poweshiek | 1580.076857 |
79 | Ringgold | 1198.624350 |
80 | Sac | 1412.599164 |
81 | Scott | 1211.649675 |
82 | Shelby | 1299.197014 |
83 | Sioux | 2056.178194 |
84 | Story | 1943.690095 |
85 | Tama | 2088.226617 |
86 | Taylor | 1571.159542 |
87 | Union | 1292.945614 |
88 | Van Buren | 1549.129170 |
89 | Wapello | 1104.862750 |
90 | Warren | 1617.493665 |
91 | Washington | 1666.506696 |
92 | Wayne | 1438.930980 |
93 | Webster | 1793.295141 |
94 | Winnebago | 1295.390700 |
95 | Winneshiek | 1587.428273 |
96 | Woodbury | 2591.144872 |
97 | Worth | 925.069678 |
98 | Wright | 1542.560045 |
df_new = pd.merge(df_new, areas, on= 'county', how='outer')
df_new
county | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | num_stores | average_store_profit | sales_per_litters | population | store_population_ratio | consumption_per_capita | area | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Adair | 4455875 | 40182.62 | 6.037369e+04 | 33807 | 4.108056e+05 | 32522.35 | 1.374661e+05 | 4420 | 31.100932 | 12.631486 | 7092 | 0.623237 | 4.585780 | 1146.149874 |
1 | Adams | 1757403 | 18103.82 | 2.717139e+04 | 8446 | 1.005968e+05 | 7547.62 | 3.361478e+04 | 1797 | 18.706055 | 13.328281 | 3693 | 0.486596 | 2.043764 | 950.420482 |
2 | Allamakee | 9079175 | 85371.49 | 1.282384e+05 | 57833 | 7.728435e+05 | 61269.76 | 2.586304e+05 | 8595 | 30.090802 | 12.613783 | 13884 | 0.619058 | 4.412976 | 1640.663925 |
3 | Appanoose | 8360175 | 82458.44 | 1.238400e+05 | 58261 | 7.113762e+05 | 53184.66 | 2.377927e+05 | 8582 | 27.708309 | 13.375589 | 12462 | 0.688654 | 4.267747 | 1439.625361 |
4 | Audubon | 2043175 | 17737.11 | 2.665607e+04 | 13978 | 1.595476e+05 | 13215.43 | 5.339370e+04 | 2045 | 26.109389 | 12.072829 | 5678 | 0.360162 | 2.327480 | 1011.305224 |
5 | Benton | 7790725 | 71349.04 | 1.071595e+05 | 47699 | 5.803559e+05 | 47612.84 | 1.940870e+05 | 7731 | 25.105032 | 12.189062 | 25699 | 0.300829 | 1.852712 | 1589.039934 |
6 | Black Hawk | 100801227 | 1107541.37 | 1.663275e+06 | 1093412 | 1.192076e+07 | 793399.13 | 3.982880e+06 | 118229 | 33.687840 | 15.024919 | 132904 | 0.889582 | 5.969716 | 1658.463133 |
7 | Boone | 16005831 | 154407.11 | 2.319393e+05 | 120271 | 1.534234e+06 | 115554.84 | 5.129844e+05 | 16515 | 31.061724 | 13.277107 | 26532 | 0.622456 | 4.355301 | 1172.618576 |
8 | Bremer | 17767453 | 173326.87 | 2.603925e+05 | 113551 | 1.487074e+06 | 114021.43 | 4.976283e+05 | 17746 | 28.041717 | 13.042060 | 24798 | 0.715622 | 4.598009 | 1254.612988 |
9 | Buchanan | 12168450 | 121504.75 | 1.824907e+05 | 89429 | 1.140914e+06 | 84484.18 | 3.814875e+05 | 12653 | 30.149961 | 13.504469 | 20992 | 0.602753 | 4.024589 | 1437.306002 |
10 | Buena Vista | 20832831 | 226554.99 | 3.402451e+05 | 114147 | 1.532283e+06 | 105358.28 | 5.121251e+05 | 21598 | 23.711690 | 14.543544 | 20332 | 1.062266 | 5.181895 | 1641.768588 |
11 | Butler | 3348800 | 27994.24 | 4.206102e+04 | 24615 | 2.786351e+05 | 25010.20 | 9.331105e+04 | 3308 | 28.207693 | 11.140858 | 14791 | 0.223650 | 1.690907 | 1714.432829 |
12 | Calhoun | 3787450 | 30301.41 | 4.553630e+04 | 27214 | 3.215042e+05 | 27977.64 | 1.075994e+05 | 3625 | 29.682604 | 11.491468 | 9846 | 0.368170 | 2.841523 | 1593.242060 |
13 | Carroll | 14460725 | 143097.04 | 2.149119e+05 | 114246 | 1.556718e+06 | 115511.56 | 5.202122e+05 | 14568 | 35.709236 | 13.476729 | 20437 | 0.712825 | 5.652080 | 1790.407255 |
14 | Cass | 10038675 | 98657.96 | 1.481477e+05 | 74583 | 1.005639e+06 | 75880.71 | 3.360579e+05 | 9988 | 33.646165 | 13.252889 | 13157 | 0.759140 | 5.767326 | 1776.544213 |
15 | Cedar | 7427125 | 67426.72 | 1.013055e+05 | 44550 | 4.927014e+05 | 38720.95 | 1.647957e+05 | 8160 | 20.195545 | 12.724413 | 18454 | 0.442181 | 2.098242 | 1232.378198 |
16 | Cerro Gordo | 50439956 | 514818.96 | 7.732206e+05 | 377860 | 4.892826e+06 | 354343.00 | 1.635296e+06 | 51566 | 31.712671 | 13.808162 | 43070 | 1.197260 | 8.227142 | 1497.770139 |
17 | Cherokee | 7638400 | 72195.69 | 1.084669e+05 | 58423 | 6.855181e+05 | 52491.70 | 2.294092e+05 | 7408 | 30.967770 | 13.059553 | 11508 | 0.643726 | 4.561323 | 1404.657609 |
18 | Chickasaw | 4325975 | 37139.09 | 5.580234e+04 | 27003 | 3.508606e+05 | 30429.56 | 1.173801e+05 | 4031 | 29.119360 | 11.530256 | 12023 | 0.335274 | 2.530946 | 1332.800885 |
19 | Clarke | 5656450 | 54720.08 | 8.218209e+04 | 37286 | 5.035890e+05 | 36768.25 | 1.683819e+05 | 5600 | 30.068200 | 13.696300 | 9309 | 0.601568 | 3.949753 | 954.543677 |
20 | Clay | 15084525 | 151187.65 | 2.270741e+05 | 104645 | 1.367561e+06 | 104603.09 | 4.569864e+05 | 15276 | 29.915320 | 13.073807 | 16333 | 0.935284 | 6.404402 | 1305.389452 |
21 | Clayton | 11180875 | 108046.71 | 1.622809e+05 | 49974 | 6.927068e+05 | 52494.39 | 2.315675e+05 | 10480 | 22.096132 | 13.195825 | 17590 | 0.595793 | 2.984331 | 2086.377422 |
22 | Clinton | 25808600 | 255757.84 | 3.841091e+05 | 245180 | 2.928261e+06 | 214882.76 | 9.784796e+05 | 28364 | 34.497237 | 13.627250 | 47309 | 0.599548 | 4.542112 | 1701.089895 |
23 | Crawford | 9178228 | 91880.62 | 1.379796e+05 | 65790 | 8.643920e+05 | 62535.95 | 2.889041e+05 | 9360 | 30.865818 | 13.822321 | 16940 | 0.552538 | 3.691615 | 1794.525720 |
24 | Dallas | 19782703 | 208239.38 | 3.128030e+05 | 166032 | 2.295133e+06 | 151828.96 | 7.673762e+05 | 21284 | 36.054136 | 15.116572 | 84516 | 0.251834 | 1.796452 | 1727.389531 |
25 | Davis | 1623500 | 14206.10 | 2.134244e+04 | 7746 | 9.618596e+04 | 8004.00 | 3.217266e+04 | 1555 | 20.689814 | 12.017236 | 8860 | 0.175508 | 0.903386 | 1202.497109 |
26 | Decatur | 1798250 | 17836.60 | 2.679681e+04 | 14389 | 1.763645e+05 | 12243.11 | 5.898266e+04 | 1994 | 29.580070 | 14.405203 | 8141 | 0.244933 | 1.503883 | 1443.427332 |
27 | Delaware | 5929650 | 56982.70 | 8.557961e+04 | 55212 | 7.547227e+05 | 54845.42 | 2.522007e+05 | 5749 | 43.868622 | 13.760907 | 17327 | 0.331794 | 3.165315 | 1275.461107 |
28 | Des Moines | 28965403 | 311376.14 | 4.674807e+05 | 294257 | 3.579586e+06 | 235900.79 | 1.195418e+06 | 32474 | 36.811526 | 15.174117 | 39739 | 0.817182 | 5.936254 | 928.456413 |
29 | Dickinson | 27326006 | 291810.42 | 4.382073e+05 | 222744 | 3.145660e+06 | 224310.44 | 1.050838e+06 | 28110 | 37.383072 | 14.023690 | 17243 | 1.630227 | 13.008783 | 1024.114722 |
30 | Dubuque | 57322060 | 601825.15 | 9.038367e+05 | 501274 | 6.621999e+06 | 460080.11 | 2.212785e+06 | 60753 | 36.422653 | 14.393144 | 97003 | 0.626300 | 4.742947 | 1445.975306 |
31 | Emmet | 5982525 | 60879.97 | 9.144739e+04 | 38693 | 5.019728e+05 | 39072.45 | 1.678386e+05 | 6071 | 27.645955 | 12.847230 | 9658 | 0.628598 | 4.045605 | 1130.769046 |
32 | Fayette | 9109128 | 81374.00 | 1.222869e+05 | 70231 | 8.602044e+05 | 68175.75 | 2.878219e+05 | 8730 | 32.969291 | 12.617454 | 20054 | 0.435325 | 3.399609 | 1955.698767 |
33 | Floyd | 8058678 | 79010.19 | 1.186743e+05 | 82689 | 1.032124e+06 | 74542.85 | 3.450133e+05 | 8223 | 41.957103 | 13.846056 | 15873 | 0.518050 | 4.696204 | 1245.579774 |
34 | Franklin | 6146325 | 60180.12 | 9.038277e+04 | 36313 | 4.522624e+05 | 35596.58 | 1.512945e+05 | 6052 | 24.999091 | 12.705219 | 10170 | 0.595084 | 3.500155 | 1346.541996 |
35 | Greene | 5200525 | 53392.20 | 8.019890e+04 | 35511 | 4.774988e+05 | 33966.78 | 1.596316e+05 | 5248 | 30.417611 | 14.057818 | 9011 | 0.582399 | 3.769480 | 1385.511780 |
36 | Grundy | 4369250 | 36670.31 | 5.510175e+04 | 26210 | 3.018362e+05 | 26267.96 | 1.009891e+05 | 4341 | 23.264027 | 11.490660 | 12313 | 0.352554 | 2.133352 | 1097.912646 |
37 | Guthrie | 3493600 | 31390.41 | 4.717285e+04 | 24678 | 2.940075e+05 | 23023.47 | 9.837735e+04 | 3398 | 28.951545 | 12.769903 | 10625 | 0.319812 | 2.166915 | 1659.326695 |
38 | Hamilton | 7882325 | 77319.08 | 1.161721e+05 | 61011 | 7.369458e+05 | 56950.35 | 2.466456e+05 | 8603 | 28.669721 | 12.940146 | 15076 | 0.570642 | 3.777550 | 1425.376163 |
39 | Hancock | 3509425 | 28221.14 | 4.241179e+04 | 26820 | 3.061684e+05 | 28072.83 | 1.025907e+05 | 3101 | 33.083089 | 10.906219 | 10835 | 0.286202 | 2.590940 | 1435.986414 |
40 | Hardin | 15096700 | 136991.10 | 2.058020e+05 | 119772 | 1.657238e+06 | 127383.22 | 5.542669e+05 | 13592 | 40.778908 | 13.009862 | 17226 | 0.789040 | 7.394823 | 1749.403531 |
41 | Harrison | 8333900 | 82437.45 | 1.238054e+05 | 42380 | 5.046314e+05 | 38339.84 | 1.687135e+05 | 9131 | 18.476999 | 13.162064 | 14149 | 0.645346 | 2.709721 | 2000.716609 |
42 | Henry | 8214425 | 84936.53 | 1.275347e+05 | 72982 | 9.634271e+05 | 67508.46 | 3.218625e+05 | 9026 | 35.659479 | 14.271205 | 19773 | 0.456481 | 3.414174 | 1374.637080 |
43 | Howard | 5501853 | 50667.53 | 7.612342e+04 | 38946 | 5.268240e+05 | 42435.81 | 1.762160e+05 | 5197 | 33.907264 | 12.414609 | 9332 | 0.556901 | 4.547344 | 1553.199575 |
44 | Humboldt | 4838675 | 46974.26 | 7.055336e+04 | 40013 | 5.050238e+05 | 38779.60 | 1.687570e+05 | 4967 | 33.975637 | 13.022925 | 9487 | 0.523559 | 4.087657 | 1325.822767 |
45 | Ida | 4907150 | 49504.88 | 7.433259e+04 | 35771 | 4.983864e+05 | 34949.87 | 1.665311e+05 | 4777 | 34.861015 | 14.260037 | 6985 | 0.683894 | 5.003560 | 1061.883800 |
46 | Iowa | 11862800 | 102657.53 | 1.542814e+05 | 74451 | 9.711115e+05 | 77025.04 | 3.251289e+05 | 11162 | 29.128192 | 12.607737 | 16311 | 0.684323 | 4.722276 | 1448.032410 |
47 | Jackson | 12728325 | 114843.67 | 1.724907e+05 | 89098 | 1.130876e+06 | 88084.97 | 3.778944e+05 | 12593 | 30.008287 | 12.838469 | 19472 | 0.646724 | 4.523673 | 2075.514783 |
48 | Jasper | 21363900 | 214003.36 | 3.214453e+05 | 131828 | 1.577656e+06 | 121386.17 | 5.274477e+05 | 23046 | 22.886734 | 12.996996 | 36708 | 0.627820 | 3.306804 | 1677.247032 |
49 | Jefferson | 6377025 | 66329.72 | 9.959432e+04 | 56831 | 7.276337e+05 | 51721.61 | 2.430598e+05 | 6446 | 37.707074 | 14.068272 | 18090 | 0.356329 | 2.859127 | 1228.460352 |
50 | Johnson | 94201143 | 1094410.17 | 1.643462e+06 | 927749 | 1.237021e+07 | 784403.93 | 4.133645e+06 | 105955 | 39.013213 | 15.770199 | 146547 | 0.723010 | 5.352576 | 1374.398467 |
51 | Jones | 15309100 | 144067.53 | 2.164019e+05 | 69101 | 8.898679e+05 | 68061.75 | 2.974959e+05 | 15545 | 19.137725 | 13.074420 | 20439 | 0.760556 | 3.329994 | 1603.902991 |
52 | Keokuk | 2720900 | 23814.27 | 3.576359e+04 | 13554 | 1.495608e+05 | 13026.40 | 4.999794e+04 | 2797 | 17.875560 | 11.481359 | 10119 | 0.276411 | 1.287321 | 1568.990538 |
53 | Kossuth | 14172628 | 142789.09 | 2.144583e+05 | 105881 | 1.479812e+06 | 108007.85 | 4.948291e+05 | 13720 | 36.066263 | 13.700965 | 15114 | 0.907768 | 7.146212 | 2171.379858 |
54 | Lee | 24974375 | 267566.06 | 4.017307e+05 | 228466 | 2.902471e+06 | 201627.37 | 9.692238e+05 | 27261 | 35.553493 | 14.395224 | 34615 | 0.787549 | 5.824855 | 1207.589119 |
55 | Linn | 158353452 | 1731106.01 | 2.599613e+06 | 1500058 | 1.806775e+07 | 1221208.85 | 6.036068e+06 | 184092 | 32.788321 | 14.794975 | 221661 | 0.830511 | 5.509354 | 2164.866374 |
56 | Louisa | 3165850 | 31041.02 | 4.661010e+04 | 20602 | 2.071461e+05 | 14809.68 | 6.917951e+04 | 3730 | 18.546786 | 13.987210 | 11142 | 0.334769 | 1.329176 | 980.907015 |
57 | Lucas | 3968675 | 40531.53 | 6.087117e+04 | 29011 | 3.761450e+05 | 27391.10 | 1.257357e+05 | 4064 | 30.938898 | 13.732379 | 8647 | 0.469990 | 3.167700 | 1154.264969 |
58 | Lyon | 9776875 | 98319.84 | 1.476554e+05 | 55318 | 7.766658e+05 | 56613.63 | 2.593880e+05 | 9647 | 26.887943 | 13.718707 | 11754 | 0.820742 | 4.816542 | 1520.471701 |
59 | Madison | 7396475 | 74647.41 | 1.121233e+05 | 54411 | 6.856734e+05 | 51462.23 | 2.293326e+05 | 7795 | 29.420481 | 13.323817 | 15848 | 0.491860 | 3.247238 | 1409.747269 |
60 | Mahaska | 9398550 | 96610.21 | 1.451289e+05 | 70861 | 8.393837e+05 | 62911.85 | 2.806475e+05 | 9670 | 29.022495 | 13.342219 | 22181 | 0.435959 | 2.836295 | 1216.496254 |
61 | Marion | 19736978 | 199812.05 | 3.000452e+05 | 133175 | 1.735448e+06 | 123141.99 | 5.798673e+05 | 20958 | 27.668067 | 14.093066 | 33189 | 0.631474 | 3.710325 | 1528.879832 |
62 | Marshall | 22552328 | 231546.74 | 3.477674e+05 | 192655 | 2.504138e+06 | 168340.47 | 8.368677e+05 | 23507 | 35.600787 | 14.875434 | 40312 | 0.583127 | 4.175939 | 1520.107767 |
63 | Mills | 4317200 | 38409.79 | 5.770141e+04 | 37547 | 4.345716e+05 | 34164.64 | 1.452974e+05 | 3968 | 36.617293 | 12.719923 | 14972 | 0.265028 | 2.281902 | 1220.953340 |
64 | Mitchell | 8198000 | 80007.93 | 1.201944e+05 | 32335 | 4.147482e+05 | 34245.82 | 1.387916e+05 | 8227 | 16.870258 | 12.110916 | 10763 | 0.764378 | 3.181810 | 1086.794650 |
65 | Monona | 9671381 | 95956.98 | 1.440858e+05 | 42020 | 5.197005e+05 | 38043.92 | 1.736055e+05 | 10023 | 17.320713 | 13.660539 | 8898 | 1.126433 | 4.275559 | 1656.794562 |
66 | Monroe | 2891525 | 28812.61 | 4.327483e+04 | 21971 | 2.896006e+05 | 20937.96 | 9.680632e+04 | 3000 | 32.268773 | 13.831367 | 7870 | 0.381194 | 2.660478 | 960.814655 |
67 | Montgomery | 6457078 | 67065.99 | 1.006981e+05 | 45167 | 5.570222e+05 | 41624.63 | 1.861282e+05 | 6713 | 27.726534 | 13.382034 | 10225 | 0.656528 | 4.070868 | 1140.359482 |
68 | Muscatine | 28497706 | 294376.85 | 4.421020e+05 | 208296 | 2.626114e+06 | 188612.00 | 8.775137e+05 | 31591 | 27.777331 | 13.923365 | 42940 | 0.735701 | 4.392455 | 1700.941831 |
69 | O'Brien | 13896728 | 132245.41 | 1.986505e+05 | 83029 | 1.057797e+06 | 84579.60 | 3.539820e+05 | 13954 | 25.367782 | 12.506521 | 14020 | 0.995292 | 6.032782 | 1620.431744 |
70 | Osceola | 3107475 | 29141.20 | 4.377315e+04 | 19244 | 2.562448e+05 | 20104.98 | 8.574306e+04 | 2981 | 28.763187 | 12.745341 | 6064 | 0.491590 | 3.315465 | 967.452680 |
71 | Page | 10230328 | 103322.96 | 1.551729e+05 | 81116 | 1.057103e+06 | 75315.10 | 3.532156e+05 | 10651 | 33.162667 | 14.035741 | 15391 | 0.692028 | 4.893451 | 1398.118634 |
72 | Palo Alto | 9396700 | 89935.75 | 1.350478e+05 | 41949 | 5.750134e+05 | 44627.15 | 1.921719e+05 | 8667 | 22.172831 | 12.884834 | 9047 | 0.957997 | 4.932812 | 1596.128377 |
73 | Plymouth | 12560609 | 132624.74 | 1.991814e+05 | 94325 | 1.286709e+06 | 93061.03 | 4.300381e+05 | 13176 | 32.637984 | 13.826506 | 25200 | 0.522857 | 3.692898 | 2259.953641 |
74 | Pocahontas | 4312750 | 37584.77 | 5.645243e+04 | 24363 | 2.971895e+05 | 25680.47 | 9.933154e+04 | 4167 | 23.837663 | 11.572587 | 6886 | 0.605141 | 3.729374 | 1297.431413 |
75 | Pottawattamie | 64280187 | 680102.80 | 1.021141e+06 | 618358 | 7.675324e+06 | 521376.00 | 2.563352e+06 | 72472 | 35.370233 | 14.721285 | 93582 | 0.774422 | 5.571328 | 2303.225231 |
76 | Poweshiek | 16062900 | 169084.71 | 2.539540e+05 | 97474 | 1.306022e+06 | 91675.42 | 4.365483e+05 | 17496 | 24.951321 | 14.246155 | 18533 | 0.944046 | 4.946604 | 1580.076857 |
77 | Ringgold | 1689850 | 14072.68 | 2.115156e+04 | 10061 | 1.386808e+05 | 11080.29 | 4.642018e+04 | 1476 | 31.449986 | 12.515991 | 5068 | 0.291239 | 2.186324 | 1198.624350 |
78 | Sac | 7934625 | 75245.54 | 1.130183e+05 | 39571 | 5.258452e+05 | 42670.28 | 1.760915e+05 | 7157 | 24.604098 | 12.323453 | 9876 | 0.724686 | 4.320603 | 1412.599164 |
79 | Scott | 112482102 | 1247750.11 | 1.873693e+06 | 1237025 | 1.397399e+07 | 914829.35 | 4.667888e+06 | 130419 | 35.791469 | 15.274967 | 172474 | 0.756166 | 5.304158 | 1211.649675 |
80 | Shelby | 5286953 | 53350.35 | 8.009987e+04 | 42963 | 5.279510e+05 | 38570.77 | 1.762560e+05 | 5627 | 31.323259 | 13.687852 | 11800 | 0.476864 | 3.268709 | 1299.197014 |
81 | Sioux | 10681100 | 102003.82 | 1.532028e+05 | 77121 | 1.069835e+06 | 79019.86 | 3.575951e+05 | 10307 | 34.694396 | 13.538806 | 34898 | 0.295346 | 2.264309 | 2056.178194 |
82 | Story | 65551096 | 727466.68 | 1.092625e+06 | 501832 | 6.730961e+06 | 456414.62 | 2.250349e+06 | 71636 | 31.413665 | 14.747469 | 97090 | 0.737831 | 4.700944 | 1943.690095 |
83 | Tama | 7242875 | 69664.27 | 1.046049e+05 | 47692 | 5.735157e+05 | 43573.67 | 1.916543e+05 | 7655 | 25.036489 | 13.161979 | 17319 | 0.442000 | 2.515946 | 2088.226617 |
84 | Taylor | 2073825 | 20885.05 | 3.135727e+04 | 9360 | 1.105983e+05 | 8433.72 | 3.693200e+04 | 2304 | 16.029514 | 13.113828 | 6216 | 0.370656 | 1.356776 | 1571.159542 |
85 | Union | 9569175 | 90467.23 | 1.358550e+05 | 66536 | 9.021116e+05 | 68179.74 | 3.015172e+05 | 9024 | 33.412815 | 13.231373 | 12420 | 0.726570 | 5.489512 | 1292.945614 |
86 | Van Buren | 1883075 | 18970.16 | 2.848695e+04 | 12806 | 1.742457e+05 | 12768.44 | 5.823345e+04 | 1920 | 30.329922 | 13.646592 | 7271 | 0.264063 | 1.756078 | 1549.129170 |
87 | Wapello | 24021278 | 251492.94 | 3.776542e+05 | 186724 | 2.292624e+06 | 160501.09 | 7.658786e+05 | 27843 | 27.507043 | 14.284163 | 34982 | 0.795924 | 4.588105 | 1104.862750 |
88 | Warren | 19568800 | 183359.55 | 2.753840e+05 | 159019 | 1.939043e+06 | 146705.53 | 6.481145e+05 | 19620 | 33.033359 | 13.217243 | 49691 | 0.394840 | 2.952356 | 1617.493665 |
89 | Washington | 10984531 | 119302.23 | 1.791272e+05 | 84389 | 1.140731e+06 | 78241.80 | 3.810510e+05 | 11745 | 32.443678 | 14.579561 | 22281 | 0.527131 | 3.511593 | 1666.506696 |
90 | Wayne | 1344500 | 12518.51 | 1.880652e+04 | 8283 | 1.054384e+05 | 8001.09 | 3.526299e+04 | 1268 | 27.809929 | 13.177998 | 6452 | 0.196528 | 1.240095 | 1438.930980 |
91 | Webster | 22537959 | 231995.40 | 3.484066e+05 | 228252 | 2.705835e+06 | 189009.94 | 9.042916e+05 | 24106 | 37.513134 | 14.315835 | 36769 | 0.655607 | 5.140470 | 1793.295141 |
92 | Winnebago | 8009300 | 72833.61 | 1.094025e+05 | 55931 | 6.827292e+05 | 56888.63 | 2.282976e+05 | 7667 | 29.776648 | 12.001155 | 10631 | 0.721193 | 5.351202 | 1295.390700 |
93 | Winneshiek | 11450225 | 107156.18 | 1.609449e+05 | 86483 | 1.189769e+06 | 89488.04 | 3.977413e+05 | 10541 | 37.732790 | 13.295288 | 20561 | 0.512670 | 4.352319 | 1587.428273 |
94 | Woodbury | 61274387 | 672801.78 | 1.010242e+06 | 620680 | 7.695638e+06 | 512631.64 | 2.570501e+06 | 67732 | 37.951055 | 15.012023 | 102779 | 0.659006 | 4.987708 | 2591.144872 |
95 | Worth | 3247475 | 28918.72 | 4.343867e+04 | 20112 | 2.466353e+05 | 20602.25 | 8.250434e+04 | 3000 | 27.501447 | 11.971281 | 7572 | 0.396197 | 2.720847 | 925.069678 |
96 | Wright | 5575150 | 51158.24 | 7.684774e+04 | 46517 | 5.706450e+05 | 44906.82 | 1.908072e+05 | 5461 | 34.939967 | 12.707313 | 12779 | 0.427342 | 3.514111 | 1542.560045 |
df_new.isnull().any()
county False bottle_volume_ml False state_bottle_cost False state_bottle_retail False bottles_sold False sale_dollars False volume_sold_liters False profit False num_stores False average_store_profit False sales_per_litters False population False store_population_ratio False consumption_per_capita False area False dtype: bool
df_new['stores_per_area'] = df_new['num_stores']/df_new['area']
df_new.head()
county | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | num_stores | average_store_profit | sales_per_litters | population | store_population_ratio | consumption_per_capita | area | stores_per_area | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Adair | 4455875 | 40182.62 | 60373.69 | 33807 | 410805.62 | 32522.35 | 137466.12 | 4420 | 31.100932 | 12.631486 | 7092 | 0.623237 | 4.585780 | 1146.149874 | 3.856389 |
1 | Adams | 1757403 | 18103.82 | 27171.39 | 8446 | 100596.80 | 7547.62 | 33614.78 | 1797 | 18.706055 | 13.328281 | 3693 | 0.486596 | 2.043764 | 950.420482 | 1.890742 |
2 | Allamakee | 9079175 | 85371.49 | 128238.38 | 57833 | 772843.46 | 61269.76 | 258630.44 | 8595 | 30.090802 | 12.613783 | 13884 | 0.619058 | 4.412976 | 1640.663925 | 5.238733 |
3 | Appanoose | 8360175 | 82458.44 | 123839.97 | 58261 | 711376.15 | 53184.66 | 237792.71 | 8582 | 27.708309 | 13.375589 | 12462 | 0.688654 | 4.267747 | 1439.625361 | 5.961273 |
4 | Audubon | 2043175 | 17737.11 | 26656.07 | 13978 | 159547.63 | 13215.43 | 53393.70 | 2045 | 26.109389 | 12.072829 | 5678 | 0.360162 | 2.327480 | 1011.305224 | 2.022139 |
# Taking care of outliers:
#df4 = df4[df4['County'] != 'Fremont']
#df4 = df2[df4['County'] != 'Davis']
income = pd.read_excel('iowa_incomes.xls')
col_names = [c.replace('/','_').replace(' ','_').replace(')','').replace('(','').lower() for c in income.columns.tolist()]
income.columns = col_names
income.head(5)
county | per_capita_income | median_household_income | median_family_income | number_of_households | |
---|---|---|---|---|---|
0 | Adair | 23497 | 45202 | 57287 | 3292 |
1 | Adams | 23549 | 40368 | 52782 | 1715 |
2 | Allamakee | 21349 | 46623 | 55926 | 5845 |
3 | Appanoose | 20084 | 34689 | 41250 | 5627 |
4 | Audubon | 24207 | 42717 | 58641 | 2617 |
df_new = df_new.merge(income, how='left', on='county')
df_new.head()
county | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | num_stores | average_store_profit | sales_per_litters | population | store_population_ratio | consumption_per_capita | area | stores_per_area | per_capita_income | median_household_income | median_family_income | number_of_households | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Adair | 4455875 | 40182.62 | 60373.69 | 33807 | 410805.62 | 32522.35 | 137466.12 | 4420 | 31.100932 | 12.631486 | 7092 | 0.623237 | 4.585780 | 1146.149874 | 3.856389 | 23497 | 45202 | 57287 | 3292 |
1 | Adams | 1757403 | 18103.82 | 27171.39 | 8446 | 100596.80 | 7547.62 | 33614.78 | 1797 | 18.706055 | 13.328281 | 3693 | 0.486596 | 2.043764 | 950.420482 | 1.890742 | 23549 | 40368 | 52782 | 1715 |
2 | Allamakee | 9079175 | 85371.49 | 128238.38 | 57833 | 772843.46 | 61269.76 | 258630.44 | 8595 | 30.090802 | 12.613783 | 13884 | 0.619058 | 4.412976 | 1640.663925 | 5.238733 | 21349 | 46623 | 55926 | 5845 |
3 | Appanoose | 8360175 | 82458.44 | 123839.97 | 58261 | 711376.15 | 53184.66 | 237792.71 | 8582 | 27.708309 | 13.375589 | 12462 | 0.688654 | 4.267747 | 1439.625361 | 5.961273 | 20084 | 34689 | 41250 | 5627 |
4 | Audubon | 2043175 | 17737.11 | 26656.07 | 13978 | 159547.63 | 13215.43 | 53393.70 | 2045 | 26.109389 | 12.072829 | 5678 | 0.360162 | 2.327480 | 1011.305224 | 2.022139 | 24207 | 42717 | 58641 | 2617 |
df_new.isnull().any()[df_new.isnull().any() == True]
Series([], dtype: bool)
Look for any statistical relationships, correlations, or other relevant properties of the dataset.
df_final = df_new.copy()
df_final.head()
county | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | num_stores | average_store_profit | sales_per_litters | population | store_population_ratio | consumption_per_capita | area | stores_per_area | per_capita_income | median_household_income | median_family_income | number_of_households | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Adair | 4455875 | 40182.62 | 60373.69 | 33807 | 410805.62 | 32522.35 | 137466.12 | 4420 | 31.100932 | 12.631486 | 7092 | 0.623237 | 4.585780 | 1146.149874 | 3.856389 | 23497 | 45202 | 57287 | 3292 |
1 | Adams | 1757403 | 18103.82 | 27171.39 | 8446 | 100596.80 | 7547.62 | 33614.78 | 1797 | 18.706055 | 13.328281 | 3693 | 0.486596 | 2.043764 | 950.420482 | 1.890742 | 23549 | 40368 | 52782 | 1715 |
2 | Allamakee | 9079175 | 85371.49 | 128238.38 | 57833 | 772843.46 | 61269.76 | 258630.44 | 8595 | 30.090802 | 12.613783 | 13884 | 0.619058 | 4.412976 | 1640.663925 | 5.238733 | 21349 | 46623 | 55926 | 5845 |
3 | Appanoose | 8360175 | 82458.44 | 123839.97 | 58261 | 711376.15 | 53184.66 | 237792.71 | 8582 | 27.708309 | 13.375589 | 12462 | 0.688654 | 4.267747 | 1439.625361 | 5.961273 | 20084 | 34689 | 41250 | 5627 |
4 | Audubon | 2043175 | 17737.11 | 26656.07 | 13978 | 159547.63 | 13215.43 | 53393.70 | 2045 | 26.109389 | 12.072829 | 5678 | 0.360162 | 2.327480 | 1011.305224 | 2.022139 | 24207 | 42717 | 58641 | 2617 |
cols_to_keep = ['sale_dollars', 'num_stores', 'population', 'store_population_ratio', 'area', u'per_capita_income', u'median_household_income', u'median_family_income']
df_final[cols_to_keep].corr().loc['sale_dollars'].sort_values(ascending=False).iloc[1:]
num_stores 0.995047 population 0.974590 per_capita_income 0.348343 store_population_ratio 0.344187 median_family_income 0.340245 area 0.281804 median_household_income 0.190911 Name: sale_dollars, dtype: float64
Observations:
num_stores
and stores_per_area
are highly correlated. Furthermore, both are highly correlated to the target variable sale_dollars
.store_population_ratio
and consumption_per_capita
.cols_to_keep = ['num_stores', 'store_population_ratio', 'consumption_per_capita', 'stores_per_area', u'per_capita_income']
df_final[cols_to_keep].corr()
num_stores | store_population_ratio | consumption_per_capita | stores_per_area | per_capita_income | |
---|---|---|---|---|---|
num_stores | 1.000000 | 0.366990 | 0.382747 | 0.955223 | 0.345534 |
store_population_ratio | 0.366990 | 1.000000 | 0.891711 | 0.395677 | 0.214897 |
consumption_per_capita | 0.382747 | 0.891711 | 1.000000 | 0.418315 | 0.276315 |
stores_per_area | 0.955223 | 0.395677 | 0.418315 | 1.000000 | 0.358652 |
per_capita_income | 0.345534 | 0.214897 | 0.276315 | 0.358652 | 1.000000 |
sns.heatmap(df_final[cols_to_keep].corr())
<matplotlib.axes._subplots.AxesSubplot at 0x11bc0a090>
cols_to_keep_2 = ['sale_dollars','num_stores', 'store_population_ratio', 'consumption_per_capita', 'stores_per_area', u'per_capita_income', u'median_household_income', u'median_family_income']
df_final[cols_to_keep_2].corr()[['sale_dollars']].iloc[1:]
sale_dollars | |
---|---|
num_stores | 0.995047 |
store_population_ratio | 0.344187 |
consumption_per_capita | 0.388819 |
stores_per_area | 0.961106 |
per_capita_income | 0.348343 |
median_household_income | 0.190911 |
median_family_income | 0.340245 |
sns.heatmap(df_final[cols_to_keep_2].corr()[['sale_dollars']].iloc[1:]);
Let us generate scatter plots for all the predictors. They provide similar information as the correlation matrices.
g = sns.pairplot(df_final[cols_to_keep])
for ax in g.axes.flatten(): # from [6]
for tick in ax.get_xticklabels():
tick.set(rotation=90);
Using scikit-learn or statsmodels, build the necessary models for your scenario. Evaluate model fit.
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.cross_validation import train_test_split
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.grid_search import GridSearchCV
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.cross_validation import cross_val_score
from sklearn.model_selection import cross_val_predict
from sklearn import linear_model
Let us first take a look at the 10 best performing counties (recall that we dropped Polk as an outlier):
df_final_sorted = df_final.copy()
df_final_sorted.sort_values("sale_dollars", inplace=True, ascending=False)
df_final_sorted[['county','sale_dollars']].head(10)
county | sale_dollars | |
---|---|---|
55 | Linn | 1.806775e+07 |
79 | Scott | 1.397399e+07 |
50 | Johnson | 1.237021e+07 |
6 | Black Hawk | 1.192076e+07 |
94 | Woodbury | 7.695638e+06 |
75 | Pottawattamie | 7.675324e+06 |
82 | Story | 6.730961e+06 |
30 | Dubuque | 6.621999e+06 |
16 | Cerro Gordo | 4.892826e+06 |
28 | Des Moines | 3.579586e+06 |
df_final_sorted[['county','sale_dollars']].tail(10)
county | sale_dollars | |
---|---|---|
56 | Louisa | 207146.11 |
26 | Decatur | 176364.48 |
86 | Van Buren | 174245.69 |
4 | Audubon | 159547.63 |
52 | Keokuk | 149560.78 |
77 | Ringgold | 138680.81 |
84 | Taylor | 110598.35 |
90 | Wayne | 105438.35 |
1 | Adams | 100596.80 |
25 | Davis | 96185.96 |
print df_final.columns.tolist()
['county', 'bottle_volume_ml', 'state_bottle_cost', 'state_bottle_retail', 'bottles_sold', 'sale_dollars', 'volume_sold_liters', 'profit', 'num_stores', 'average_store_profit', 'sales_per_litters', 'population', 'store_population_ratio', 'consumption_per_capita', 'area', 'stores_per_area', u'per_capita_income', u'median_household_income', u'median_family_income', u'number_of_households']
features = ['num_stores','population', 'store_population_ratio', \
'consumption_per_capita', 'stores_per_area', u'per_capita_income']
X = df_final[features]
y = df_final['sale_dollars']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)
combs = []
for num in range(1,len(features)+1):
combs.append([i[0] for i in list(itertools.combinations(features, num))])
We now create the training and testing data, instantiate the models and test all models you want:
lr = linear_model.LinearRegression(normalize=True)
ridge = linear_model.RidgeCV(cv=5)
lasso = linear_model.LassoCV(cv=5)
models = [lr,lasso,ridge]
Make a list of $R^2$ combinations:
r2_comb_lst = []
for comb in combs:
for m in models:
model = m.fit(X_train[comb],y_train)
r2 = m.score(X_test[comb], y_test)
r2_comb_lst.append([round(r2,3),comb,str(model).split('(')[0]])
Find the best predictors using itemgetter
:
import operator
r2_comb_lst.sort(key=operator.itemgetter(1))
print r2_comb_lst[-1]
[0.974, ['num_stores', 'population', 'store_population_ratio', 'consumption_per_capita', 'stores_per_area', u'per_capita_income'], 'RidgeCV']
The best predictors are:
r2_comb_lst[-1][1]
['num_stores', 'population', 'store_population_ratio', 'consumption_per_capita', 'stores_per_area', u'per_capita_income']
Note: Using
str(model).split('(')[0]
on something like:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=True))
we can extract only the model since we convert it to a string, split on (
and get the first element
Dropping highly correlated predictors:
features=['population','store_population_ratio','stores_per_area',u'per_capita_income']
X = df_final[features]
y = df_final['sale_dollars']
ridge = linear_model.RidgeCV(cv=5)
model = ridge.fit(X,y)
print 'r-squared: {}'.format(model.score(X,y))
r-squared: 0.98031775254
coefficients = pd.DataFrame([features, model.coef_.tolist()], index=['feature', 'coefficients']).T
coefficients
feature | coefficients | |
---|---|---|
0 | population | 58.8369 |
1 | store_population_ratio | 1.05083e+06 |
2 | stores_per_area | 49967.5 |
3 | per_capita_income | -71.6356 |
coefficients['coefficients'] = coefficients['coefficients'].astype(float)
coefficients = coefficients.sort_values(by='coefficients', ascending=False)
Again make sure that you record any valuable information. For example, in the tax scenario, did you find the sales from the first three months of the year to be a good predictor of the total sales for the year? Plot the predictions versus the true values and discuss the successes and limitations of your models
predicted = model.predict(X)
fig, ax = plt.subplots(figsize=(5,5));
plt.scatter(predicted, y);
plt.plot([min(y), max(y)], [min(y), max(y)], '-');
plt.title('Predicted and actual sales');
plt.xlabel('Predicted sales');
plt.ylabel('Actual sales');
pred = [round(p,0) for p in predicted]
df_with_pred = df_final.copy()
df_with_pred['sales_prediction_dollars'] = pred
df_with_pred
county | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | num_stores | average_store_profit | sales_per_litters | population | store_population_ratio | consumption_per_capita | area | stores_per_area | per_capita_income | median_household_income | median_family_income | number_of_households | sales_prediction_dollars | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Adair | 4455875 | 40182.62 | 6.037369e+04 | 33807 | 4.108056e+05 | 32522.35 | 1.374661e+05 | 4420 | 31.100932 | 12.631486 | 7092 | 0.623237 | 4.585780 | 1146.149874 | 3.856389 | 23497 | 45202 | 57287 | 3292 | 221482.0 |
1 | Adams | 1757403 | 18103.82 | 2.717139e+04 | 8446 | 1.005968e+05 | 7547.62 | 3.361478e+04 | 1797 | 18.706055 | 13.328281 | 3693 | 0.486596 | 2.043764 | 950.420482 | 1.890742 | 23549 | 40368 | 52782 | 1715 | -224036.0 |
2 | Allamakee | 9079175 | 85371.49 | 1.282384e+05 | 57833 | 7.728435e+05 | 61269.76 | 2.586304e+05 | 8595 | 30.090802 | 12.613783 | 13884 | 0.619058 | 4.412976 | 1640.663925 | 5.238733 | 21349 | 46623 | 55926 | 5845 | 839656.0 |
3 | Appanoose | 8360175 | 82458.44 | 1.238400e+05 | 58261 | 7.113762e+05 | 53184.66 | 2.377927e+05 | 8582 | 27.708309 | 13.375589 | 12462 | 0.688654 | 4.267747 | 1439.625361 | 5.961273 | 20084 | 34689 | 41250 | 5627 | 955846.0 |
4 | Audubon | 2043175 | 17737.11 | 2.665607e+04 | 13978 | 1.595476e+05 | 13215.43 | 5.339370e+04 | 2045 | 26.109389 | 12.072829 | 5678 | 0.360162 | 2.327480 | 1011.305224 | 2.022139 | 24207 | 42717 | 58641 | 2617 | -280677.0 |
5 | Benton | 7790725 | 71349.04 | 1.071595e+05 | 47699 | 5.803559e+05 | 47612.84 | 1.940870e+05 | 7731 | 25.105032 | 12.189062 | 25699 | 0.300829 | 1.852712 | 1589.039934 | 4.865202 | 25111 | 54726 | 64970 | 10302 | 912250.0 |
6 | Black Hawk | 100801227 | 1107541.37 | 1.663275e+06 | 1093412 | 1.192076e+07 | 793399.13 | 3.982880e+06 | 118229 | 33.687840 | 15.024919 | 132904 | 0.889582 | 5.969716 | 1658.463133 | 71.288290 | 23357 | 44178 | 57495 | 52470 | 11283189.0 |
7 | Boone | 16005831 | 154407.11 | 2.319393e+05 | 120271 | 1.534234e+06 | 115554.84 | 5.129844e+05 | 16515 | 31.061724 | 13.277107 | 26532 | 0.622456 | 4.355301 | 1172.618576 | 14.083864 | 25998 | 49578 | 66872 | 10728 | 1696331.0 |
8 | Bremer | 17767453 | 173326.87 | 2.603925e+05 | 113551 | 1.487074e+06 | 114021.43 | 4.976283e+05 | 17746 | 28.041717 | 13.042060 | 24798 | 0.715622 | 4.598009 | 1254.612988 | 14.144601 | 26522 | 55676 | 68602 | 9385 | 1657708.0 |
9 | Buchanan | 12168450 | 121504.75 | 1.824907e+05 | 89429 | 1.140914e+06 | 84484.18 | 3.814875e+05 | 12653 | 30.149961 | 13.504469 | 20992 | 0.602753 | 4.024589 | 1437.306002 | 8.803275 | 23437 | 51961 | 61421 | 8161 | 1269271.0 |
10 | Buena Vista | 20832831 | 226554.99 | 3.402451e+05 | 114147 | 1.532283e+06 | 105358.28 | 5.121251e+05 | 21598 | 23.711690 | 14.543544 | 20332 | 1.062266 | 5.181895 | 1641.768588 | 13.155325 | 21256 | 43182 | 53382 | 7522 | 2087010.0 |
11 | Butler | 3348800 | 27994.24 | 4.206102e+04 | 24615 | 2.786351e+05 | 25010.20 | 9.331105e+04 | 3308 | 28.207693 | 11.140858 | 14791 | 0.223650 | 1.690907 | 1714.432829 | 1.929501 | 24030 | 47702 | 59641 | 6120 | 120103.0 |
12 | Calhoun | 3787450 | 30301.41 | 4.553630e+04 | 27214 | 3.215042e+05 | 27977.64 | 1.075994e+05 | 3625 | 29.682604 | 11.491468 | 9846 | 0.368170 | 2.841523 | 1593.242060 | 2.275235 | 23049 | 41611 | 50037 | 4242 | 68571.0 |
13 | Carroll | 14460725 | 143097.04 | 2.149119e+05 | 114246 | 1.556718e+06 | 115511.56 | 5.202122e+05 | 14568 | 35.709236 | 13.476729 | 20437 | 0.712825 | 5.652080 | 1790.407255 | 8.136696 | 25094 | 47507 | 61960 | 8683 | 1200276.0 |
14 | Cass | 10038675 | 98657.96 | 1.481477e+05 | 74583 | 1.005639e+06 | 75880.71 | 3.360579e+05 | 9988 | 33.646165 | 13.252889 | 13157 | 0.759140 | 5.767326 | 1776.544213 | 5.622151 | 21787 | 40820 | 48884 | 5980 | 931866.0 |
15 | Cedar | 7427125 | 67426.72 | 1.013055e+05 | 44550 | 4.927014e+05 | 38720.95 | 1.647957e+05 | 8160 | 20.195545 | 12.724413 | 18454 | 0.442181 | 2.098242 | 1232.378198 | 6.621344 | 24742 | 54321 | 63893 | 7511 | 748698.0 |
16 | Cerro Gordo | 50439956 | 514818.96 | 7.732206e+05 | 377860 | 4.892826e+06 | 354343.00 | 1.635296e+06 | 51566 | 31.712671 | 13.808162 | 43070 | 1.197260 | 8.227142 | 1497.770139 | 34.428514 | 25463 | 44741 | 60148 | 19350 | 4328297.0 |
17 | Cherokee | 7638400 | 72195.69 | 1.084669e+05 | 58423 | 6.855181e+05 | 52491.70 | 2.294092e+05 | 7408 | 30.967770 | 13.059553 | 11508 | 0.643726 | 4.561323 | 1404.657609 | 5.273883 | 24507 | 44635 | 56696 | 5207 | 501313.0 |
18 | Chickasaw | 4325975 | 37139.09 | 5.580234e+04 | 27003 | 3.508606e+05 | 30429.56 | 1.173801e+05 | 4031 | 29.119360 | 11.530256 | 12023 | 0.335274 | 2.530946 | 1332.800885 | 3.024458 | 22447 | 41372 | 50530 | 5204 | 242653.0 |
19 | Clarke | 5656450 | 54720.08 | 8.218209e+04 | 37286 | 5.035890e+05 | 36768.25 | 1.683819e+05 | 5600 | 30.068200 | 13.696300 | 9309 | 0.601568 | 3.949753 | 954.543677 | 5.866678 | 23271 | 45596 | 54707 | 3701 | 445791.0 |
20 | Clay | 15084525 | 151187.65 | 2.270741e+05 | 104645 | 1.367561e+06 | 104603.09 | 4.569864e+05 | 15276 | 29.915320 | 13.073807 | 16333 | 0.935284 | 6.404402 | 1305.389452 | 11.702255 | 25398 | 43542 | 56460 | 7282 | 1348963.0 |
21 | Clayton | 11180875 | 108046.71 | 1.622809e+05 | 49974 | 6.927068e+05 | 52494.39 | 2.315675e+05 | 10480 | 22.096132 | 13.195825 | 17590 | 0.595793 | 2.984331 | 2086.377422 | 5.023060 | 22303 | 45873 | 53905 | 7599 | 954141.0 |
22 | Clinton | 25808600 | 255757.84 | 3.841091e+05 | 245180 | 2.928261e+06 | 214882.76 | 9.784796e+05 | 28364 | 34.497237 | 13.627250 | 47309 | 0.599548 | 4.542112 | 1701.089895 | 16.674016 | 23573 | 46170 | 58681 | 20223 | 3197853.0 |
23 | Crawford | 9178228 | 91880.62 | 1.379796e+05 | 65790 | 8.643920e+05 | 62535.95 | 2.889041e+05 | 9360 | 30.865818 | 13.822321 | 16940 | 0.552538 | 3.691615 | 1794.525720 | 5.215863 | 21181 | 44377 | 53794 | 6413 | 960452.0 |
24 | Dallas | 19782703 | 208239.38 | 3.128030e+05 | 166032 | 2.295133e+06 | 151828.96 | 7.673762e+05 | 21284 | 36.054136 | 15.116572 | 84516 | 0.251834 | 1.796452 | 1727.389531 | 12.321483 | 33051 | 67037 | 84018 | 25240 | 4125161.0 |
25 | Davis | 1623500 | 14206.10 | 2.134244e+04 | 7746 | 9.618596e+04 | 8004.00 | 3.217266e+04 | 1555 | 20.689814 | 12.017236 | 8860 | 0.175508 | 0.903386 | 1202.497109 | 1.293142 | 21970 | 46597 | 52855 | 3201 | -163676.0 |
26 | Decatur | 1798250 | 17836.60 | 2.679681e+04 | 14389 | 1.763645e+05 | 12243.11 | 5.898266e+04 | 1994 | 29.580070 | 14.405203 | 8141 | 0.244933 | 1.503883 | 1443.427332 | 1.381434 | 18195 | 37138 | 48015 | 3223 | 141811.0 |
27 | Delaware | 5929650 | 56982.70 | 8.557961e+04 | 55212 | 7.547227e+05 | 54845.42 | 2.522007e+05 | 5749 | 43.868622 | 13.760907 | 17327 | 0.331794 | 3.165315 | 1275.461107 | 4.507389 | 22578 | 47078 | 59802 | 7062 | 615781.0 |
28 | Des Moines | 28965403 | 311376.14 | 4.674807e+05 | 294257 | 3.579586e+06 | 235900.79 | 1.195418e+06 | 32474 | 36.811526 | 15.174117 | 39739 | 0.817182 | 5.936254 | 928.456413 | 34.976332 | 22555 | 41937 | 53946 | 17003 | 3968601.0 |
29 | Dickinson | 27326006 | 291810.42 | 4.382073e+05 | 222744 | 3.145660e+06 | 224310.44 | 1.050838e+06 | 28110 | 37.383072 | 14.023690 | 17243 | 1.630227 | 13.008783 | 1024.114722 | 27.448097 | 29459 | 50174 | 59648 | 7554 | 2628643.0 |
30 | Dubuque | 57322060 | 601825.15 | 9.038367e+05 | 501274 | 6.621999e+06 | 460080.11 | 2.212785e+06 | 60753 | 36.422653 | 14.393144 | 97003 | 0.626300 | 4.742947 | 1445.975306 | 42.015240 | 25045 | 48573 | 61138 | 36815 | 7310597.0 |
31 | Emmet | 5982525 | 60879.97 | 9.144739e+04 | 38693 | 5.019728e+05 | 39072.45 | 1.678386e+05 | 6071 | 27.645955 | 12.847230 | 9658 | 0.628598 | 4.045605 | 1130.769046 | 5.368912 | 24371 | 42286 | 55844 | 4236 | 391058.0 |
32 | Fayette | 9109128 | 81374.00 | 1.222869e+05 | 70231 | 8.602044e+05 | 68175.75 | 2.878219e+05 | 8730 | 32.969291 | 12.617454 | 20054 | 0.435325 | 3.399609 | 1955.698767 | 4.463878 | 21566 | 41055 | 52627 | 8634 | 955344.0 |
33 | Floyd | 8058678 | 79010.19 | 1.186743e+05 | 82689 | 1.032124e+06 | 74542.85 | 3.450133e+05 | 8223 | 41.957103 | 13.846056 | 15873 | 0.518050 | 4.696204 | 1245.579774 | 6.601745 | 21416 | 39467 | 52808 | 6886 | 913846.0 |
34 | Franklin | 6146325 | 60180.12 | 9.038277e+04 | 36313 | 4.522624e+05 | 35596.58 | 1.512945e+05 | 6052 | 24.999091 | 12.705219 | 10170 | 0.595084 | 3.500155 | 1346.541996 | 4.494475 | 22507 | 44863 | 52917 | 4332 | 475800.0 |
35 | Greene | 5200525 | 53392.20 | 8.019890e+04 | 35511 | 4.774988e+05 | 33966.78 | 1.596316e+05 | 5248 | 30.417611 | 14.057818 | 9011 | 0.582399 | 3.769480 | 1385.511780 | 3.787770 | 23947 | 43286 | 60133 | 3996 | 255811.0 |
36 | Grundy | 4369250 | 36670.31 | 5.510175e+04 | 26210 | 3.018362e+05 | 26267.96 | 1.009891e+05 | 4341 | 23.264027 | 11.490660 | 12313 | 0.352554 | 2.133352 | 1097.912646 | 3.953866 | 26916 | 56184 | 68151 | 5131 | 4175.0 |
37 | Guthrie | 3493600 | 31390.41 | 4.717285e+04 | 24678 | 2.940075e+05 | 23023.47 | 9.837735e+04 | 3398 | 28.951545 | 12.769903 | 10625 | 0.319812 | 2.166915 | 1659.326695 | 2.047819 | 26590 | 50090 | 61951 | 4544 | -201436.0 |
38 | Hamilton | 7882325 | 77319.08 | 1.161721e+05 | 61011 | 7.369458e+05 | 56950.35 | 2.466456e+05 | 8603 | 28.669721 | 12.940146 | 15076 | 0.570642 | 3.777550 | 1425.376163 | 6.035600 | 24765 | 46188 | 61472 | 6540 | 654023.0 |
39 | Hancock | 3509425 | 28221.14 | 4.241179e+04 | 26820 | 3.061684e+05 | 28072.83 | 1.025907e+05 | 3101 | 33.083089 | 10.906219 | 10835 | 0.286202 | 2.590940 | 1435.986414 | 2.159491 | 22713 | 47318 | 55922 | 4741 | 58912.0 |
40 | Hardin | 15096700 | 136991.10 | 2.058020e+05 | 119772 | 1.657238e+06 | 127383.22 | 5.542669e+05 | 13592 | 40.778908 | 13.009862 | 17226 | 0.789040 | 7.394823 | 1749.403531 | 7.769505 | 24154 | 44694 | 57612 | 7296 | 1140430.0 |
41 | Harrison | 8333900 | 82437.45 | 1.238054e+05 | 42380 | 5.046314e+05 | 38339.84 | 1.687135e+05 | 9131 | 18.476999 | 13.162064 | 14149 | 0.645346 | 2.709721 | 2000.716609 | 4.563865 | 24221 | 51303 | 63283 | 5987 | 643413.0 |
42 | Henry | 8214425 | 84936.53 | 1.275347e+05 | 72982 | 9.634271e+05 | 67508.46 | 3.218625e+05 | 9026 | 35.659479 | 14.271205 | 19773 | 0.456481 | 3.414174 | 1374.637080 | 6.566097 | 23056 | 41983 | 53985 | 7666 | 959348.0 |
43 | Howard | 5501853 | 50667.53 | 7.612342e+04 | 38946 | 5.268240e+05 | 42435.81 | 1.762160e+05 | 5197 | 33.907264 | 12.414609 | 9332 | 0.556901 | 4.547344 | 1553.199575 | 3.345996 | 22417 | 46068 | 55582 | 3944 | 335431.0 |
44 | Humboldt | 4838675 | 46974.26 | 7.055336e+04 | 40013 | 5.050238e+05 | 38779.60 | 1.687570e+05 | 4967 | 33.975637 | 13.022925 | 9487 | 0.523559 | 4.087657 | 1325.822767 | 3.746353 | 24568 | 45282 | 57063 | 4209 | 175430.0 |
45 | Ida | 4907150 | 49504.88 | 7.433259e+04 | 35771 | 4.983864e+05 | 34949.87 | 1.665311e+05 | 4777 | 34.861015 | 14.260037 | 6985 | 0.683894 | 5.003560 | 1061.883800 | 4.498609 | 23841 | 44521 | 58635 | 3052 | 286374.0 |
46 | Iowa | 11862800 | 102657.53 | 1.542814e+05 | 74451 | 9.711115e+05 | 77025.04 | 3.251289e+05 | 11162 | 29.128192 | 12.607737 | 16311 | 0.684323 | 4.722276 | 1448.032410 | 7.708391 | 26721 | 56053 | 64578 | 6677 | 789613.0 |
47 | Jackson | 12728325 | 114843.67 | 1.724907e+05 | 89098 | 1.130876e+06 | 88084.97 | 3.778944e+05 | 12593 | 30.008287 | 12.838469 | 19472 | 0.646724 | 4.523673 | 2075.514783 | 6.067410 | 23008 | 42489 | 54210 | 8289 | 1120072.0 |
48 | Jasper | 21363900 | 214003.36 | 3.214453e+05 | 131828 | 1.577656e+06 | 121386.17 | 5.274477e+05 | 23046 | 22.886734 | 12.996996 | 36708 | 0.627820 | 3.306804 | 1677.247032 | 13.740373 | 23160 | 46396 | 56484 | 14806 | 2486830.0 |
49 | Jefferson | 6377025 | 66329.72 | 9.959432e+04 | 56831 | 7.276337e+05 | 51721.61 | 2.430598e+05 | 6446 | 37.707074 | 14.068272 | 18090 | 0.356329 | 2.859127 | 1228.460352 | 5.247219 | 23853 | 44167 | 55352 | 6846 | 632088.0 |
50 | Johnson | 94201143 | 1094410.17 | 1.643462e+06 | 927749 | 1.237021e+07 | 784403.93 | 4.133645e+06 | 105955 | 39.013213 | 15.770199 | 146547 | 0.723010 | 5.352576 | 1374.398467 | 77.091908 | 28008 | 51380 | 74547 | 52715 | 11867678.0 |
51 | Jones | 15309100 | 144067.53 | 2.164019e+05 | 69101 | 8.898679e+05 | 68061.75 | 2.974959e+05 | 15545 | 19.137725 | 13.074420 | 20439 | 0.760556 | 3.329994 | 1603.902991 | 9.691983 | 22873 | 47955 | 59167 | 8181 | 1487368.0 |
52 | Keokuk | 2720900 | 23814.27 | 3.576359e+04 | 13554 | 1.495608e+05 | 13026.40 | 4.999794e+04 | 2797 | 17.875560 | 11.481359 | 10119 | 0.276411 | 1.287321 | 1568.990538 | 1.782675 | 22088 | 42698 | 53456 | 4408 | 32440.0 |
53 | Kossuth | 14172628 | 142789.09 | 2.144583e+05 | 105881 | 1.479812e+06 | 108007.85 | 4.948291e+05 | 13720 | 36.066263 | 13.700965 | 15114 | 0.907768 | 7.146212 | 2171.379858 | 6.318563 | 27415 | 48277 | 61012 | 6697 | 834826.0 |
54 | Lee | 24974375 | 267566.06 | 4.017307e+05 | 228466 | 2.902471e+06 | 201627.37 | 9.692238e+05 | 27261 | 35.553493 | 14.395224 | 34615 | 0.787549 | 5.824855 | 1207.589119 | 22.574731 | 21324 | 42444 | 50630 | 14610 | 3104487.0 |
55 | Linn | 158353452 | 1731106.01 | 2.599613e+06 | 1500058 | 1.806775e+07 | 1221208.85 | 6.036068e+06 | 184092 | 32.788321 | 14.794975 | 221661 | 0.830511 | 5.509354 | 2164.866374 | 85.036195 | 28239 | 53674 | 69250 | 86134 | 16780528.0 |
56 | Louisa | 3165850 | 31041.02 | 4.661010e+04 | 20602 | 2.071461e+05 | 14809.68 | 6.917951e+04 | 3730 | 18.546786 | 13.987210 | 11142 | 0.334769 | 1.329176 | 980.907015 | 3.802603 | 20367 | 50457 | 54923 | 4346 | 378171.0 |
57 | Lucas | 3968675 | 40531.53 | 6.087117e+04 | 29011 | 3.761450e+05 | 27391.10 | 1.257357e+05 | 4064 | 30.938898 | 13.732379 | 8647 | 0.469990 | 3.167700 | 1154.264969 | 3.520855 | 19967 | 43005 | 56647 | 3689 | 388043.0 |
58 | Lyon | 9776875 | 98319.84 | 1.476554e+05 | 55318 | 7.766658e+05 | 56613.63 | 2.593880e+05 | 9647 | 26.887943 | 13.718707 | 11754 | 0.820742 | 4.816542 | 1520.471701 | 6.344742 | 21613 | 49506 | 57348 | 4442 | 962622.0 |
59 | Madison | 7396475 | 74647.41 | 1.121233e+05 | 54411 | 6.856734e+05 | 51462.23 | 2.293326e+05 | 7795 | 29.420481 | 13.323817 | 15848 | 0.491860 | 3.247238 | 1409.747269 | 5.529360 | 25711 | 53183 | 67099 | 6025 | 523595.0 |
60 | Mahaska | 9398550 | 96610.21 | 1.451289e+05 | 70861 | 8.393837e+05 | 62911.85 | 2.806475e+05 | 9670 | 29.022495 | 13.342219 | 22181 | 0.435959 | 2.836295 | 1216.496254 | 7.949059 | 21568 | 45025 | 57877 | 8975 | 1255159.0 |
61 | Marion | 19736978 | 199812.05 | 3.000452e+05 | 133175 | 1.735448e+06 | 123141.99 | 5.798673e+05 | 20958 | 27.668067 | 14.093066 | 33189 | 0.631474 | 3.710325 | 1528.879832 | 13.708075 | 24613 | 53370 | 65817 | 12723 | 2177923.0 |
62 | Marshall | 22552328 | 231546.74 | 3.477674e+05 | 192655 | 2.504138e+06 | 168340.47 | 8.368677e+05 | 23507 | 35.600787 | 14.875434 | 40312 | 0.583127 | 4.175939 | 1520.107767 | 15.464035 | 22407 | 45232 | 55716 | 15538 | 2791982.0 |
63 | Mills | 4317200 | 38409.79 | 5.770141e+04 | 37547 | 4.345716e+05 | 34164.64 | 1.452974e+05 | 3968 | 36.617293 | 12.719923 | 14972 | 0.265028 | 2.281902 | 1220.953340 | 3.249919 | 25400 | 59481 | 73532 | 5605 | 142072.0 |
64 | Mitchell | 8198000 | 80007.93 | 1.201944e+05 | 32335 | 4.147482e+05 | 34245.82 | 1.387916e+05 | 8227 | 16.870258 | 12.110916 | 10763 | 0.764378 | 3.181810 | 1086.794650 | 7.569967 | 22820 | 48506 | 63356 | 4395 | 819843.0 |
65 | Monona | 9671381 | 95956.98 | 1.440858e+05 | 42020 | 5.197005e+05 | 38043.92 | 1.736055e+05 | 10023 | 17.320713 | 13.660539 | 8898 | 1.126433 | 4.275559 | 1656.794562 | 6.049634 | 22774 | 41398 | 51098 | 4050 | 1017900.0 |
66 | Monroe | 2891525 | 28812.61 | 4.327483e+04 | 21971 | 2.896006e+05 | 20937.96 | 9.680632e+04 | 3000 | 32.268773 | 13.831367 | 7870 | 0.381194 | 2.660478 | 960.814655 | 3.122350 | 21228 | 43245 | 53052 | 3213 | 138773.0 |
67 | Montgomery | 6457078 | 67065.99 | 1.006981e+05 | 45167 | 5.570222e+05 | 41624.63 | 1.861282e+05 | 6713 | 27.726534 | 13.382034 | 10225 | 0.656528 | 4.070868 | 1140.359482 | 5.886740 | 21301 | 38624 | 50595 | 4558 | 699564.0 |
68 | Muscatine | 28497706 | 294376.85 | 4.421020e+05 | 208296 | 2.626114e+06 | 188612.00 | 8.775137e+05 | 31591 | 27.777331 | 13.923365 | 42940 | 0.735701 | 4.392455 | 1700.941831 | 18.572652 | 24138 | 51025 | 61445 | 16412 | 3138265.0 |
69 | O'Brien | 13896728 | 132245.41 | 1.986505e+05 | 83029 | 1.057797e+06 | 84579.60 | 3.539820e+05 | 13954 | 25.367782 | 12.506521 | 14020 | 0.995292 | 6.032782 | 1620.431744 | 8.611285 | 24771 | 44018 | 59391 | 6069 | 1166399.0 |
70 | Osceola | 3107475 | 29141.20 | 4.377315e+04 | 19244 | 2.562448e+05 | 20104.98 | 8.574306e+04 | 2981 | 28.763187 | 12.745341 | 6064 | 0.491590 | 3.315465 | 967.452680 | 3.081288 | 23063 | 43889 | 58286 | 2682 | 15017.0 |
71 | Page | 10230328 | 103322.96 | 1.551729e+05 | 81116 | 1.057103e+06 | 75315.10 | 3.532156e+05 | 10651 | 33.162667 | 14.035741 | 15391 | 0.692028 | 4.893451 | 1398.118634 | 7.618095 | 21204 | 40778 | 52791 | 6393 | 1134280.0 |
72 | Palo Alto | 9396700 | 89935.75 | 1.350478e+05 | 41949 | 5.750134e+05 | 44627.15 | 1.921719e+05 | 8667 | 22.172831 | 12.884834 | 9047 | 0.957997 | 4.932812 | 1596.128377 | 5.430014 | 23071 | 42800 | 57208 | 3994 | 797432.0 |
73 | Plymouth | 12560609 | 132624.74 | 1.991814e+05 | 94325 | 1.286709e+06 | 93061.03 | 4.300381e+05 | 13176 | 32.637984 | 13.826506 | 25200 | 0.522857 | 3.692898 | 2259.953641 | 5.830208 | 28060 | 56379 | 69261 | 9875 | 953171.0 |
74 | Pocahontas | 4312750 | 37584.77 | 5.645243e+04 | 24363 | 2.971895e+05 | 25680.47 | 9.933154e+04 | 4167 | 23.837663 | 11.572587 | 6886 | 0.605141 | 3.729374 | 1297.431413 | 3.211730 | 23385 | 42105 | 56250 | 3233 | 166156.0 |
75 | Pottawattamie | 64280187 | 680102.80 | 1.021141e+06 | 618358 | 7.675324e+06 | 521376.00 | 2.563352e+06 | 72472 | 35.370233 | 14.721285 | 93582 | 0.774422 | 5.571328 | 2303.225231 | 31.465442 | 23782 | 48728 | 60354 | 36775 | 6828297.0 |
76 | Poweshiek | 16062900 | 169084.71 | 2.539540e+05 | 97474 | 1.306022e+06 | 91675.42 | 4.365483e+05 | 17496 | 24.951321 | 14.246155 | 18533 | 0.944046 | 4.946604 | 1580.076857 | 11.072879 | 25218 | 50998 | 65744 | 7555 | 1469057.0 |
77 | Ringgold | 1689850 | 14072.68 | 2.115156e+04 | 10061 | 1.386808e+05 | 11080.29 | 4.642018e+04 | 1476 | 31.449986 | 12.515991 | 5068 | 0.291239 | 2.186324 | 1198.624350 | 1.231412 | 21858 | 42336 | 51269 | 2047 | -260232.0 |
78 | Sac | 7934625 | 75245.54 | 1.130183e+05 | 39571 | 5.258452e+05 | 42670.28 | 1.760915e+05 | 7157 | 24.604098 | 12.323453 | 9876 | 0.724686 | 4.320603 | 1412.599164 | 5.066547 | 23837 | 42986 | 54304 | 4482 | 528002.0 |
79 | Scott | 112482102 | 1247750.11 | 1.873693e+06 | 1237025 | 1.397399e+07 | 914829.35 | 4.667888e+06 | 130419 | 35.791469 | 15.274967 | 172474 | 0.756166 | 5.304158 | 1211.649675 | 107.637548 | 27408 | 49964 | 64513 | 66765 | 14997255.0 |
80 | Shelby | 5286953 | 53350.35 | 8.009987e+04 | 42963 | 5.279510e+05 | 38570.77 | 1.762560e+05 | 5627 | 31.323259 | 13.687852 | 11800 | 0.476864 | 3.268709 | 1299.197014 | 4.331137 | 22389 | 44085 | 55523 | 5085 | 447766.0 |
81 | Sioux | 10681100 | 102003.82 | 1.532028e+05 | 77121 | 1.069835e+06 | 79019.86 | 3.575951e+05 | 10307 | 34.694396 | 13.538806 | 34898 | 0.295346 | 2.264309 | 2056.178194 | 5.012698 | 21333 | 51557 | 60043 | 11584 | 1725739.0 |
82 | Story | 65551096 | 727466.68 | 1.092625e+06 | 501832 | 6.730961e+06 | 456414.62 | 2.250349e+06 | 71636 | 31.413665 | 14.747469 | 97090 | 0.737831 | 4.700944 | 1943.690095 | 36.855670 | 25450 | 48248 | 74278 | 34736 | 7146093.0 |
83 | Tama | 7242875 | 69664.27 | 1.046049e+05 | 47692 | 5.735157e+05 | 43573.67 | 1.916543e+05 | 7655 | 25.036489 | 13.161979 | 17319 | 0.442000 | 2.515946 | 2088.226617 | 3.665790 | 23041 | 46288 | 55011 | 6947 | 655898.0 |
84 | Taylor | 2073825 | 20885.05 | 3.135727e+04 | 9360 | 1.105983e+05 | 8433.72 | 3.693200e+04 | 2304 | 16.029514 | 13.113828 | 6216 | 0.370656 | 1.356776 | 1571.159542 | 1.466433 | 21335 | 40300 | 48156 | 2679 | -60024.0 |
85 | Union | 9569175 | 90467.23 | 1.358550e+05 | 66536 | 9.021116e+05 | 68179.74 | 3.015172e+05 | 9024 | 33.412815 | 13.231373 | 12420 | 0.726570 | 5.489512 | 1292.945614 | 6.979412 | 20435 | 40879 | 50546 | 5271 | 1018948.0 |
86 | Van Buren | 1883075 | 18970.16 | 2.848695e+04 | 12806 | 1.742457e+05 | 12768.44 | 5.823345e+04 | 1920 | 30.329922 | 13.646592 | 7271 | 0.264063 | 1.756078 | 1549.129170 | 1.239406 | 20209 | 40073 | 50064 | 3108 | -40646.0 |
87 | Wapello | 24021278 | 251492.94 | 3.776542e+05 | 186724 | 2.292624e+06 | 160501.09 | 7.658786e+05 | 27843 | 27.507043 | 14.284163 | 34982 | 0.795924 | 4.588105 | 1104.862750 | 25.200415 | 22376 | 40093 | 49309 | 14552 | 3190720.0 |
88 | Warren | 19568800 | 183359.55 | 2.753840e+05 | 159019 | 1.939043e+06 | 146705.53 | 6.481145e+05 | 19620 | 33.033359 | 13.217243 | 49691 | 0.394840 | 2.952356 | 1617.493665 | 12.129877 | 28798 | 62034 | 74042 | 17262 | 2521533.0 |
89 | Washington | 10984531 | 119302.23 | 1.791272e+05 | 84389 | 1.140731e+06 | 78241.80 | 3.810510e+05 | 11745 | 32.443678 | 14.579561 | 22281 | 0.527131 | 3.511593 | 1666.506696 | 7.047676 | 23979 | 50710 | 60466 | 8741 | 1139096.0 |
90 | Wayne | 1344500 | 12518.51 | 1.880652e+04 | 8283 | 1.054384e+05 | 8001.09 | 3.526299e+04 | 1268 | 27.809929 | 13.177998 | 6452 | 0.196528 | 1.240095 | 1438.930980 | 0.881210 | 18795 | 35425 | 44784 | 2652 | -76407.0 |
91 | Webster | 22537959 | 231995.40 | 3.484066e+05 | 228252 | 2.705835e+06 | 189009.94 | 9.042916e+05 | 24106 | 37.513134 | 14.315835 | 36769 | 0.655607 | 5.140470 | 1793.295141 | 13.442294 | 22653 | 40806 | 54129 | 15580 | 2541044.0 |
92 | Winnebago | 8009300 | 72833.61 | 1.094025e+05 | 55931 | 6.827292e+05 | 56888.63 | 2.282976e+05 | 7667 | 29.776648 | 12.001155 | 10631 | 0.721193 | 5.351202 | 1295.390700 | 5.918678 | 22684 | 41871 | 58700 | 4597 | 693928.0 |
93 | Winneshiek | 11450225 | 107156.18 | 1.609449e+05 | 86483 | 1.189769e+06 | 89488.04 | 3.977413e+05 | 10541 | 37.732790 | 13.295288 | 20561 | 0.512670 | 4.352319 | 1587.428273 | 6.640300 | 23608 | 50693 | 61558 | 7997 | 1028921.0 |
94 | Woodbury | 61274387 | 672801.78 | 1.010242e+06 | 620680 | 7.695638e+06 | 512631.64 | 2.570501e+06 | 67732 | 37.951055 | 15.012023 | 102779 | 0.659006 | 4.987708 | 2591.144872 | 26.139797 | 22069 | 44343 | 55957 | 39052 | 7104739.0 |
95 | Worth | 3247475 | 28918.72 | 4.343867e+04 | 20112 | 2.466353e+05 | 20602.25 | 8.250434e+04 | 3000 | 27.501447 | 11.971281 | 7572 | 0.396197 | 2.720847 | 925.069678 | 3.242999 | 27240 | 49673 | 56659 | 3172 | -287641.0 |
96 | Wright | 5575150 | 51158.24 | 7.684774e+04 | 46517 | 5.706450e+05 | 44906.82 | 1.908072e+05 | 5461 | 34.939967 | 12.707313 | 12779 | 0.427342 | 3.514111 | 1542.560045 | 3.540219 | 23068 | 44035 | 53890 | 5625 | 365167.0 |
We notice that some of the predicted values are negative. This also happened in other works [1] and the reason is not quite clear. We remove them.
df_pred = df_with_pred[df_with_pred['sales_prediction_dollars']>0]
df_pred
county | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | profit | num_stores | average_store_profit | sales_per_litters | population | store_population_ratio | consumption_per_capita | area | stores_per_area | per_capita_income | median_household_income | median_family_income | number_of_households | sales_prediction_dollars | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
55 | Linn | 158353452 | 1731106.01 | 2.599613e+06 | 1500058 | 1.806775e+07 | 1221208.85 | 6.036068e+06 | 184092 | 32.788321 | 14.794975 | 221661 | 0.830511 | 5.509354 | 2164.866374 | 85.036195 | 28239 | 53674 | 69250 | 86134 | 16780528.0 |
79 | Scott | 112482102 | 1247750.11 | 1.873693e+06 | 1237025 | 1.397399e+07 | 914829.35 | 4.667888e+06 | 130419 | 35.791469 | 15.274967 | 172474 | 0.756166 | 5.304158 | 1211.649675 | 107.637548 | 27408 | 49964 | 64513 | 66765 | 14997255.0 |
50 | Johnson | 94201143 | 1094410.17 | 1.643462e+06 | 927749 | 1.237021e+07 | 784403.93 | 4.133645e+06 | 105955 | 39.013213 | 15.770199 | 146547 | 0.723010 | 5.352576 | 1374.398467 | 77.091908 | 28008 | 51380 | 74547 | 52715 | 11867678.0 |
6 | Black Hawk | 100801227 | 1107541.37 | 1.663275e+06 | 1093412 | 1.192076e+07 | 793399.13 | 3.982880e+06 | 118229 | 33.687840 | 15.024919 | 132904 | 0.889582 | 5.969716 | 1658.463133 | 71.288290 | 23357 | 44178 | 57495 | 52470 | 11283189.0 |
30 | Dubuque | 57322060 | 601825.15 | 9.038367e+05 | 501274 | 6.621999e+06 | 460080.11 | 2.212785e+06 | 60753 | 36.422653 | 14.393144 | 97003 | 0.626300 | 4.742947 | 1445.975306 | 42.015240 | 25045 | 48573 | 61138 | 36815 | 7310597.0 |
82 | Story | 65551096 | 727466.68 | 1.092625e+06 | 501832 | 6.730961e+06 | 456414.62 | 2.250349e+06 | 71636 | 31.413665 | 14.747469 | 97090 | 0.737831 | 4.700944 | 1943.690095 | 36.855670 | 25450 | 48248 | 74278 | 34736 | 7146093.0 |
94 | Woodbury | 61274387 | 672801.78 | 1.010242e+06 | 620680 | 7.695638e+06 | 512631.64 | 2.570501e+06 | 67732 | 37.951055 | 15.012023 | 102779 | 0.659006 | 4.987708 | 2591.144872 | 26.139797 | 22069 | 44343 | 55957 | 39052 | 7104739.0 |
75 | Pottawattamie | 64280187 | 680102.80 | 1.021141e+06 | 618358 | 7.675324e+06 | 521376.00 | 2.563352e+06 | 72472 | 35.370233 | 14.721285 | 93582 | 0.774422 | 5.571328 | 2303.225231 | 31.465442 | 23782 | 48728 | 60354 | 36775 | 6828297.0 |
16 | Cerro Gordo | 50439956 | 514818.96 | 7.732206e+05 | 377860 | 4.892826e+06 | 354343.00 | 1.635296e+06 | 51566 | 31.712671 | 13.808162 | 43070 | 1.197260 | 8.227142 | 1497.770139 | 34.428514 | 25463 | 44741 | 60148 | 19350 | 4328297.0 |
24 | Dallas | 19782703 | 208239.38 | 3.128030e+05 | 166032 | 2.295133e+06 | 151828.96 | 7.673762e+05 | 21284 | 36.054136 | 15.116572 | 84516 | 0.251834 | 1.796452 | 1727.389531 | 12.321483 | 33051 | 67037 | 84018 | 25240 | 4125161.0 |
28 | Des Moines | 28965403 | 311376.14 | 4.674807e+05 | 294257 | 3.579586e+06 | 235900.79 | 1.195418e+06 | 32474 | 36.811526 | 15.174117 | 39739 | 0.817182 | 5.936254 | 928.456413 | 34.976332 | 22555 | 41937 | 53946 | 17003 | 3968601.0 |
22 | Clinton | 25808600 | 255757.84 | 3.841091e+05 | 245180 | 2.928261e+06 | 214882.76 | 9.784796e+05 | 28364 | 34.497237 | 13.627250 | 47309 | 0.599548 | 4.542112 | 1701.089895 | 16.674016 | 23573 | 46170 | 58681 | 20223 | 3197853.0 |
87 | Wapello | 24021278 | 251492.94 | 3.776542e+05 | 186724 | 2.292624e+06 | 160501.09 | 7.658786e+05 | 27843 | 27.507043 | 14.284163 | 34982 | 0.795924 | 4.588105 | 1104.862750 | 25.200415 | 22376 | 40093 | 49309 | 14552 | 3190720.0 |
68 | Muscatine | 28497706 | 294376.85 | 4.421020e+05 | 208296 | 2.626114e+06 | 188612.00 | 8.775137e+05 | 31591 | 27.777331 | 13.923365 | 42940 | 0.735701 | 4.392455 | 1700.941831 | 18.572652 | 24138 | 51025 | 61445 | 16412 | 3138265.0 |
54 | Lee | 24974375 | 267566.06 | 4.017307e+05 | 228466 | 2.902471e+06 | 201627.37 | 9.692238e+05 | 27261 | 35.553493 | 14.395224 | 34615 | 0.787549 | 5.824855 | 1207.589119 | 22.574731 | 21324 | 42444 | 50630 | 14610 | 3104487.0 |
62 | Marshall | 22552328 | 231546.74 | 3.477674e+05 | 192655 | 2.504138e+06 | 168340.47 | 8.368677e+05 | 23507 | 35.600787 | 14.875434 | 40312 | 0.583127 | 4.175939 | 1520.107767 | 15.464035 | 22407 | 45232 | 55716 | 15538 | 2791982.0 |
29 | Dickinson | 27326006 | 291810.42 | 4.382073e+05 | 222744 | 3.145660e+06 | 224310.44 | 1.050838e+06 | 28110 | 37.383072 | 14.023690 | 17243 | 1.630227 | 13.008783 | 1024.114722 | 27.448097 | 29459 | 50174 | 59648 | 7554 | 2628643.0 |
91 | Webster | 22537959 | 231995.40 | 3.484066e+05 | 228252 | 2.705835e+06 | 189009.94 | 9.042916e+05 | 24106 | 37.513134 | 14.315835 | 36769 | 0.655607 | 5.140470 | 1793.295141 | 13.442294 | 22653 | 40806 | 54129 | 15580 | 2541044.0 |
88 | Warren | 19568800 | 183359.55 | 2.753840e+05 | 159019 | 1.939043e+06 | 146705.53 | 6.481145e+05 | 19620 | 33.033359 | 13.217243 | 49691 | 0.394840 | 2.952356 | 1617.493665 | 12.129877 | 28798 | 62034 | 74042 | 17262 | 2521533.0 |
48 | Jasper | 21363900 | 214003.36 | 3.214453e+05 | 131828 | 1.577656e+06 | 121386.17 | 5.274477e+05 | 23046 | 22.886734 | 12.996996 | 36708 | 0.627820 | 3.306804 | 1677.247032 | 13.740373 | 23160 | 46396 | 56484 | 14806 | 2486830.0 |
61 | Marion | 19736978 | 199812.05 | 3.000452e+05 | 133175 | 1.735448e+06 | 123141.99 | 5.798673e+05 | 20958 | 27.668067 | 14.093066 | 33189 | 0.631474 | 3.710325 | 1528.879832 | 13.708075 | 24613 | 53370 | 65817 | 12723 | 2177923.0 |
10 | Buena Vista | 20832831 | 226554.99 | 3.402451e+05 | 114147 | 1.532283e+06 | 105358.28 | 5.121251e+05 | 21598 | 23.711690 | 14.543544 | 20332 | 1.062266 | 5.181895 | 1641.768588 | 13.155325 | 21256 | 43182 | 53382 | 7522 | 2087010.0 |
81 | Sioux | 10681100 | 102003.82 | 1.532028e+05 | 77121 | 1.069835e+06 | 79019.86 | 3.575951e+05 | 10307 | 34.694396 | 13.538806 | 34898 | 0.295346 | 2.264309 | 2056.178194 | 5.012698 | 21333 | 51557 | 60043 | 11584 | 1725739.0 |
7 | Boone | 16005831 | 154407.11 | 2.319393e+05 | 120271 | 1.534234e+06 | 115554.84 | 5.129844e+05 | 16515 | 31.061724 | 13.277107 | 26532 | 0.622456 | 4.355301 | 1172.618576 | 14.083864 | 25998 | 49578 | 66872 | 10728 | 1696331.0 |
8 | Bremer | 17767453 | 173326.87 | 2.603925e+05 | 113551 | 1.487074e+06 | 114021.43 | 4.976283e+05 | 17746 | 28.041717 | 13.042060 | 24798 | 0.715622 | 4.598009 | 1254.612988 | 14.144601 | 26522 | 55676 | 68602 | 9385 | 1657708.0 |
51 | Jones | 15309100 | 144067.53 | 2.164019e+05 | 69101 | 8.898679e+05 | 68061.75 | 2.974959e+05 | 15545 | 19.137725 | 13.074420 | 20439 | 0.760556 | 3.329994 | 1603.902991 | 9.691983 | 22873 | 47955 | 59167 | 8181 | 1487368.0 |
76 | Poweshiek | 16062900 | 169084.71 | 2.539540e+05 | 97474 | 1.306022e+06 | 91675.42 | 4.365483e+05 | 17496 | 24.951321 | 14.246155 | 18533 | 0.944046 | 4.946604 | 1580.076857 | 11.072879 | 25218 | 50998 | 65744 | 7555 | 1469057.0 |
20 | Clay | 15084525 | 151187.65 | 2.270741e+05 | 104645 | 1.367561e+06 | 104603.09 | 4.569864e+05 | 15276 | 29.915320 | 13.073807 | 16333 | 0.935284 | 6.404402 | 1305.389452 | 11.702255 | 25398 | 43542 | 56460 | 7282 | 1348963.0 |
9 | Buchanan | 12168450 | 121504.75 | 1.824907e+05 | 89429 | 1.140914e+06 | 84484.18 | 3.814875e+05 | 12653 | 30.149961 | 13.504469 | 20992 | 0.602753 | 4.024589 | 1437.306002 | 8.803275 | 23437 | 51961 | 61421 | 8161 | 1269271.0 |
60 | Mahaska | 9398550 | 96610.21 | 1.451289e+05 | 70861 | 8.393837e+05 | 62911.85 | 2.806475e+05 | 9670 | 29.022495 | 13.342219 | 22181 | 0.435959 | 2.836295 | 1216.496254 | 7.949059 | 21568 | 45025 | 57877 | 8975 | 1255159.0 |
13 | Carroll | 14460725 | 143097.04 | 2.149119e+05 | 114246 | 1.556718e+06 | 115511.56 | 5.202122e+05 | 14568 | 35.709236 | 13.476729 | 20437 | 0.712825 | 5.652080 | 1790.407255 | 8.136696 | 25094 | 47507 | 61960 | 8683 | 1200276.0 |
69 | O'Brien | 13896728 | 132245.41 | 1.986505e+05 | 83029 | 1.057797e+06 | 84579.60 | 3.539820e+05 | 13954 | 25.367782 | 12.506521 | 14020 | 0.995292 | 6.032782 | 1620.431744 | 8.611285 | 24771 | 44018 | 59391 | 6069 | 1166399.0 |
40 | Hardin | 15096700 | 136991.10 | 2.058020e+05 | 119772 | 1.657238e+06 | 127383.22 | 5.542669e+05 | 13592 | 40.778908 | 13.009862 | 17226 | 0.789040 | 7.394823 | 1749.403531 | 7.769505 | 24154 | 44694 | 57612 | 7296 | 1140430.0 |
89 | Washington | 10984531 | 119302.23 | 1.791272e+05 | 84389 | 1.140731e+06 | 78241.80 | 3.810510e+05 | 11745 | 32.443678 | 14.579561 | 22281 | 0.527131 | 3.511593 | 1666.506696 | 7.047676 | 23979 | 50710 | 60466 | 8741 | 1139096.0 |
71 | Page | 10230328 | 103322.96 | 1.551729e+05 | 81116 | 1.057103e+06 | 75315.10 | 3.532156e+05 | 10651 | 33.162667 | 14.035741 | 15391 | 0.692028 | 4.893451 | 1398.118634 | 7.618095 | 21204 | 40778 | 52791 | 6393 | 1134280.0 |
47 | Jackson | 12728325 | 114843.67 | 1.724907e+05 | 89098 | 1.130876e+06 | 88084.97 | 3.778944e+05 | 12593 | 30.008287 | 12.838469 | 19472 | 0.646724 | 4.523673 | 2075.514783 | 6.067410 | 23008 | 42489 | 54210 | 8289 | 1120072.0 |
93 | Winneshiek | 11450225 | 107156.18 | 1.609449e+05 | 86483 | 1.189769e+06 | 89488.04 | 3.977413e+05 | 10541 | 37.732790 | 13.295288 | 20561 | 0.512670 | 4.352319 | 1587.428273 | 6.640300 | 23608 | 50693 | 61558 | 7997 | 1028921.0 |
85 | Union | 9569175 | 90467.23 | 1.358550e+05 | 66536 | 9.021116e+05 | 68179.74 | 3.015172e+05 | 9024 | 33.412815 | 13.231373 | 12420 | 0.726570 | 5.489512 | 1292.945614 | 6.979412 | 20435 | 40879 | 50546 | 5271 | 1018948.0 |
65 | Monona | 9671381 | 95956.98 | 1.440858e+05 | 42020 | 5.197005e+05 | 38043.92 | 1.736055e+05 | 10023 | 17.320713 | 13.660539 | 8898 | 1.126433 | 4.275559 | 1656.794562 | 6.049634 | 22774 | 41398 | 51098 | 4050 | 1017900.0 |
58 | Lyon | 9776875 | 98319.84 | 1.476554e+05 | 55318 | 7.766658e+05 | 56613.63 | 2.593880e+05 | 9647 | 26.887943 | 13.718707 | 11754 | 0.820742 | 4.816542 | 1520.471701 | 6.344742 | 21613 | 49506 | 57348 | 4442 | 962622.0 |
23 | Crawford | 9178228 | 91880.62 | 1.379796e+05 | 65790 | 8.643920e+05 | 62535.95 | 2.889041e+05 | 9360 | 30.865818 | 13.822321 | 16940 | 0.552538 | 3.691615 | 1794.525720 | 5.215863 | 21181 | 44377 | 53794 | 6413 | 960452.0 |
42 | Henry | 8214425 | 84936.53 | 1.275347e+05 | 72982 | 9.634271e+05 | 67508.46 | 3.218625e+05 | 9026 | 35.659479 | 14.271205 | 19773 | 0.456481 | 3.414174 | 1374.637080 | 6.566097 | 23056 | 41983 | 53985 | 7666 | 959348.0 |
3 | Appanoose | 8360175 | 82458.44 | 1.238400e+05 | 58261 | 7.113762e+05 | 53184.66 | 2.377927e+05 | 8582 | 27.708309 | 13.375589 | 12462 | 0.688654 | 4.267747 | 1439.625361 | 5.961273 | 20084 | 34689 | 41250 | 5627 | 955846.0 |
32 | Fayette | 9109128 | 81374.00 | 1.222869e+05 | 70231 | 8.602044e+05 | 68175.75 | 2.878219e+05 | 8730 | 32.969291 | 12.617454 | 20054 | 0.435325 | 3.399609 | 1955.698767 | 4.463878 | 21566 | 41055 | 52627 | 8634 | 955344.0 |
21 | Clayton | 11180875 | 108046.71 | 1.622809e+05 | 49974 | 6.927068e+05 | 52494.39 | 2.315675e+05 | 10480 | 22.096132 | 13.195825 | 17590 | 0.595793 | 2.984331 | 2086.377422 | 5.023060 | 22303 | 45873 | 53905 | 7599 | 954141.0 |
73 | Plymouth | 12560609 | 132624.74 | 1.991814e+05 | 94325 | 1.286709e+06 | 93061.03 | 4.300381e+05 | 13176 | 32.637984 | 13.826506 | 25200 | 0.522857 | 3.692898 | 2259.953641 | 5.830208 | 28060 | 56379 | 69261 | 9875 | 953171.0 |
14 | Cass | 10038675 | 98657.96 | 1.481477e+05 | 74583 | 1.005639e+06 | 75880.71 | 3.360579e+05 | 9988 | 33.646165 | 13.252889 | 13157 | 0.759140 | 5.767326 | 1776.544213 | 5.622151 | 21787 | 40820 | 48884 | 5980 | 931866.0 |
33 | Floyd | 8058678 | 79010.19 | 1.186743e+05 | 82689 | 1.032124e+06 | 74542.85 | 3.450133e+05 | 8223 | 41.957103 | 13.846056 | 15873 | 0.518050 | 4.696204 | 1245.579774 | 6.601745 | 21416 | 39467 | 52808 | 6886 | 913846.0 |
5 | Benton | 7790725 | 71349.04 | 1.071595e+05 | 47699 | 5.803559e+05 | 47612.84 | 1.940870e+05 | 7731 | 25.105032 | 12.189062 | 25699 | 0.300829 | 1.852712 | 1589.039934 | 4.865202 | 25111 | 54726 | 64970 | 10302 | 912250.0 |
2 | Allamakee | 9079175 | 85371.49 | 1.282384e+05 | 57833 | 7.728435e+05 | 61269.76 | 2.586304e+05 | 8595 | 30.090802 | 12.613783 | 13884 | 0.619058 | 4.412976 | 1640.663925 | 5.238733 | 21349 | 46623 | 55926 | 5845 | 839656.0 |
53 | Kossuth | 14172628 | 142789.09 | 2.144583e+05 | 105881 | 1.479812e+06 | 108007.85 | 4.948291e+05 | 13720 | 36.066263 | 13.700965 | 15114 | 0.907768 | 7.146212 | 2171.379858 | 6.318563 | 27415 | 48277 | 61012 | 6697 | 834826.0 |
64 | Mitchell | 8198000 | 80007.93 | 1.201944e+05 | 32335 | 4.147482e+05 | 34245.82 | 1.387916e+05 | 8227 | 16.870258 | 12.110916 | 10763 | 0.764378 | 3.181810 | 1086.794650 | 7.569967 | 22820 | 48506 | 63356 | 4395 | 819843.0 |
72 | Palo Alto | 9396700 | 89935.75 | 1.350478e+05 | 41949 | 5.750134e+05 | 44627.15 | 1.921719e+05 | 8667 | 22.172831 | 12.884834 | 9047 | 0.957997 | 4.932812 | 1596.128377 | 5.430014 | 23071 | 42800 | 57208 | 3994 | 797432.0 |
46 | Iowa | 11862800 | 102657.53 | 1.542814e+05 | 74451 | 9.711115e+05 | 77025.04 | 3.251289e+05 | 11162 | 29.128192 | 12.607737 | 16311 | 0.684323 | 4.722276 | 1448.032410 | 7.708391 | 26721 | 56053 | 64578 | 6677 | 789613.0 |
15 | Cedar | 7427125 | 67426.72 | 1.013055e+05 | 44550 | 4.927014e+05 | 38720.95 | 1.647957e+05 | 8160 | 20.195545 | 12.724413 | 18454 | 0.442181 | 2.098242 | 1232.378198 | 6.621344 | 24742 | 54321 | 63893 | 7511 | 748698.0 |
67 | Montgomery | 6457078 | 67065.99 | 1.006981e+05 | 45167 | 5.570222e+05 | 41624.63 | 1.861282e+05 | 6713 | 27.726534 | 13.382034 | 10225 | 0.656528 | 4.070868 | 1140.359482 | 5.886740 | 21301 | 38624 | 50595 | 4558 | 699564.0 |
92 | Winnebago | 8009300 | 72833.61 | 1.094025e+05 | 55931 | 6.827292e+05 | 56888.63 | 2.282976e+05 | 7667 | 29.776648 | 12.001155 | 10631 | 0.721193 | 5.351202 | 1295.390700 | 5.918678 | 22684 | 41871 | 58700 | 4597 | 693928.0 |
83 | Tama | 7242875 | 69664.27 | 1.046049e+05 | 47692 | 5.735157e+05 | 43573.67 | 1.916543e+05 | 7655 | 25.036489 | 13.161979 | 17319 | 0.442000 | 2.515946 | 2088.226617 | 3.665790 | 23041 | 46288 | 55011 | 6947 | 655898.0 |
38 | Hamilton | 7882325 | 77319.08 | 1.161721e+05 | 61011 | 7.369458e+05 | 56950.35 | 2.466456e+05 | 8603 | 28.669721 | 12.940146 | 15076 | 0.570642 | 3.777550 | 1425.376163 | 6.035600 | 24765 | 46188 | 61472 | 6540 | 654023.0 |
41 | Harrison | 8333900 | 82437.45 | 1.238054e+05 | 42380 | 5.046314e+05 | 38339.84 | 1.687135e+05 | 9131 | 18.476999 | 13.162064 | 14149 | 0.645346 | 2.709721 | 2000.716609 | 4.563865 | 24221 | 51303 | 63283 | 5987 | 643413.0 |
49 | Jefferson | 6377025 | 66329.72 | 9.959432e+04 | 56831 | 7.276337e+05 | 51721.61 | 2.430598e+05 | 6446 | 37.707074 | 14.068272 | 18090 | 0.356329 | 2.859127 | 1228.460352 | 5.247219 | 23853 | 44167 | 55352 | 6846 | 632088.0 |
27 | Delaware | 5929650 | 56982.70 | 8.557961e+04 | 55212 | 7.547227e+05 | 54845.42 | 2.522007e+05 | 5749 | 43.868622 | 13.760907 | 17327 | 0.331794 | 3.165315 | 1275.461107 | 4.507389 | 22578 | 47078 | 59802 | 7062 | 615781.0 |
78 | Sac | 7934625 | 75245.54 | 1.130183e+05 | 39571 | 5.258452e+05 | 42670.28 | 1.760915e+05 | 7157 | 24.604098 | 12.323453 | 9876 | 0.724686 | 4.320603 | 1412.599164 | 5.066547 | 23837 | 42986 | 54304 | 4482 | 528002.0 |
59 | Madison | 7396475 | 74647.41 | 1.121233e+05 | 54411 | 6.856734e+05 | 51462.23 | 2.293326e+05 | 7795 | 29.420481 | 13.323817 | 15848 | 0.491860 | 3.247238 | 1409.747269 | 5.529360 | 25711 | 53183 | 67099 | 6025 | 523595.0 |
17 | Cherokee | 7638400 | 72195.69 | 1.084669e+05 | 58423 | 6.855181e+05 | 52491.70 | 2.294092e+05 | 7408 | 30.967770 | 13.059553 | 11508 | 0.643726 | 4.561323 | 1404.657609 | 5.273883 | 24507 | 44635 | 56696 | 5207 | 501313.0 |
34 | Franklin | 6146325 | 60180.12 | 9.038277e+04 | 36313 | 4.522624e+05 | 35596.58 | 1.512945e+05 | 6052 | 24.999091 | 12.705219 | 10170 | 0.595084 | 3.500155 | 1346.541996 | 4.494475 | 22507 | 44863 | 52917 | 4332 | 475800.0 |
80 | Shelby | 5286953 | 53350.35 | 8.009987e+04 | 42963 | 5.279510e+05 | 38570.77 | 1.762560e+05 | 5627 | 31.323259 | 13.687852 | 11800 | 0.476864 | 3.268709 | 1299.197014 | 4.331137 | 22389 | 44085 | 55523 | 5085 | 447766.0 |
19 | Clarke | 5656450 | 54720.08 | 8.218209e+04 | 37286 | 5.035890e+05 | 36768.25 | 1.683819e+05 | 5600 | 30.068200 | 13.696300 | 9309 | 0.601568 | 3.949753 | 954.543677 | 5.866678 | 23271 | 45596 | 54707 | 3701 | 445791.0 |
31 | Emmet | 5982525 | 60879.97 | 9.144739e+04 | 38693 | 5.019728e+05 | 39072.45 | 1.678386e+05 | 6071 | 27.645955 | 12.847230 | 9658 | 0.628598 | 4.045605 | 1130.769046 | 5.368912 | 24371 | 42286 | 55844 | 4236 | 391058.0 |
57 | Lucas | 3968675 | 40531.53 | 6.087117e+04 | 29011 | 3.761450e+05 | 27391.10 | 1.257357e+05 | 4064 | 30.938898 | 13.732379 | 8647 | 0.469990 | 3.167700 | 1154.264969 | 3.520855 | 19967 | 43005 | 56647 | 3689 | 388043.0 |
56 | Louisa | 3165850 | 31041.02 | 4.661010e+04 | 20602 | 2.071461e+05 | 14809.68 | 6.917951e+04 | 3730 | 18.546786 | 13.987210 | 11142 | 0.334769 | 1.329176 | 980.907015 | 3.802603 | 20367 | 50457 | 54923 | 4346 | 378171.0 |
96 | Wright | 5575150 | 51158.24 | 7.684774e+04 | 46517 | 5.706450e+05 | 44906.82 | 1.908072e+05 | 5461 | 34.939967 | 12.707313 | 12779 | 0.427342 | 3.514111 | 1542.560045 | 3.540219 | 23068 | 44035 | 53890 | 5625 | 365167.0 |
43 | Howard | 5501853 | 50667.53 | 7.612342e+04 | 38946 | 5.268240e+05 | 42435.81 | 1.762160e+05 | 5197 | 33.907264 | 12.414609 | 9332 | 0.556901 | 4.547344 | 1553.199575 | 3.345996 | 22417 | 46068 | 55582 | 3944 | 335431.0 |
45 | Ida | 4907150 | 49504.88 | 7.433259e+04 | 35771 | 4.983864e+05 | 34949.87 | 1.665311e+05 | 4777 | 34.861015 | 14.260037 | 6985 | 0.683894 | 5.003560 | 1061.883800 | 4.498609 | 23841 | 44521 | 58635 | 3052 | 286374.0 |
35 | Greene | 5200525 | 53392.20 | 8.019890e+04 | 35511 | 4.774988e+05 | 33966.78 | 1.596316e+05 | 5248 | 30.417611 | 14.057818 | 9011 | 0.582399 | 3.769480 | 1385.511780 | 3.787770 | 23947 | 43286 | 60133 | 3996 | 255811.0 |
18 | Chickasaw | 4325975 | 37139.09 | 5.580234e+04 | 27003 | 3.508606e+05 | 30429.56 | 1.173801e+05 | 4031 | 29.119360 | 11.530256 | 12023 | 0.335274 | 2.530946 | 1332.800885 | 3.024458 | 22447 | 41372 | 50530 | 5204 | 242653.0 |
0 | Adair | 4455875 | 40182.62 | 6.037369e+04 | 33807 | 4.108056e+05 | 32522.35 | 1.374661e+05 | 4420 | 31.100932 | 12.631486 | 7092 | 0.623237 | 4.585780 | 1146.149874 | 3.856389 | 23497 | 45202 | 57287 | 3292 | 221482.0 |
44 | Humboldt | 4838675 | 46974.26 | 7.055336e+04 | 40013 | 5.050238e+05 | 38779.60 | 1.687570e+05 | 4967 | 33.975637 | 13.022925 | 9487 | 0.523559 | 4.087657 | 1325.822767 | 3.746353 | 24568 | 45282 | 57063 | 4209 | 175430.0 |
74 | Pocahontas | 4312750 | 37584.77 | 5.645243e+04 | 24363 | 2.971895e+05 | 25680.47 | 9.933154e+04 | 4167 | 23.837663 | 11.572587 | 6886 | 0.605141 | 3.729374 | 1297.431413 | 3.211730 | 23385 | 42105 | 56250 | 3233 | 166156.0 |
63 | Mills | 4317200 | 38409.79 | 5.770141e+04 | 37547 | 4.345716e+05 | 34164.64 | 1.452974e+05 | 3968 | 36.617293 | 12.719923 | 14972 | 0.265028 | 2.281902 | 1220.953340 | 3.249919 | 25400 | 59481 | 73532 | 5605 | 142072.0 |
26 | Decatur | 1798250 | 17836.60 | 2.679681e+04 | 14389 | 1.763645e+05 | 12243.11 | 5.898266e+04 | 1994 | 29.580070 | 14.405203 | 8141 | 0.244933 | 1.503883 | 1443.427332 | 1.381434 | 18195 | 37138 | 48015 | 3223 | 141811.0 |
66 | Monroe | 2891525 | 28812.61 | 4.327483e+04 | 21971 | 2.896006e+05 | 20937.96 | 9.680632e+04 | 3000 | 32.268773 | 13.831367 | 7870 | 0.381194 | 2.660478 | 960.814655 | 3.122350 | 21228 | 43245 | 53052 | 3213 | 138773.0 |
11 | Butler | 3348800 | 27994.24 | 4.206102e+04 | 24615 | 2.786351e+05 | 25010.20 | 9.331105e+04 | 3308 | 28.207693 | 11.140858 | 14791 | 0.223650 | 1.690907 | 1714.432829 | 1.929501 | 24030 | 47702 | 59641 | 6120 | 120103.0 |
12 | Calhoun | 3787450 | 30301.41 | 4.553630e+04 | 27214 | 3.215042e+05 | 27977.64 | 1.075994e+05 | 3625 | 29.682604 | 11.491468 | 9846 | 0.368170 | 2.841523 | 1593.242060 | 2.275235 | 23049 | 41611 | 50037 | 4242 | 68571.0 |
39 | Hancock | 3509425 | 28221.14 | 4.241179e+04 | 26820 | 3.061684e+05 | 28072.83 | 1.025907e+05 | 3101 | 33.083089 | 10.906219 | 10835 | 0.286202 | 2.590940 | 1435.986414 | 2.159491 | 22713 | 47318 | 55922 | 4741 | 58912.0 |
52 | Keokuk | 2720900 | 23814.27 | 3.576359e+04 | 13554 | 1.495608e+05 | 13026.40 | 4.999794e+04 | 2797 | 17.875560 | 11.481359 | 10119 | 0.276411 | 1.287321 | 1568.990538 | 1.782675 | 22088 | 42698 | 53456 | 4408 | 32440.0 |
70 | Osceola | 3107475 | 29141.20 | 4.377315e+04 | 19244 | 2.562448e+05 | 20104.98 | 8.574306e+04 | 2981 | 28.763187 | 12.745341 | 6064 | 0.491590 | 3.315465 | 967.452680 | 3.081288 | 23063 | 43889 | 58286 | 2682 | 15017.0 |
36 | Grundy | 4369250 | 36670.31 | 5.510175e+04 | 26210 | 3.018362e+05 | 26267.96 | 1.009891e+05 | 4341 | 23.264027 | 11.490660 | 12313 | 0.352554 | 2.133352 | 1097.912646 | 3.953866 | 26916 | 56184 | 68151 | 5131 | 4175.0 |
Let us order the counties with higher predicted sales:
df_pred.sort_values("sales_prediction_dollars", inplace=True, ascending=False)
df_pred[['county','sales_per_litters','population','stores_per_area','sales_prediction_dollars']].head(4)
county | sales_per_litters | population | stores_per_area | sales_prediction_dollars | |
---|---|---|---|---|---|
55 | Linn | 14.794975 | 221661 | 85.036195 | 16780528.0 |
79 | Scott | 15.274967 | 172474 | 107.637548 | 14997255.0 |
50 | Johnson | 15.770199 | 146547 | 77.091908 | 11867678.0 |
6 | Black Hawk | 15.024919 | 132904 | 71.288290 | 11283189.0 |
df_pred.sort_values("population", inplace=True, ascending=False)
df_pred[['county','sales_per_litters','population','stores_per_area','sales_prediction_dollars']].head(4)
county | sales_per_litters | population | stores_per_area | sales_prediction_dollars | |
---|---|---|---|---|---|
55 | Linn | 14.794975 | 221661 | 85.036195 | 16780528.0 |
79 | Scott | 15.274967 | 172474 | 107.637548 | 14997255.0 |
50 | Johnson | 15.770199 | 146547 | 77.091908 | 11867678.0 |
6 | Black Hawk | 15.024919 | 132904 | 71.288290 | 11283189.0 |
Linn has higher sales which in part is because it has larger population which is not very useful information.
Let us order by sales_per_litters
to see which county has more high-end stores.
df_pred.sort_values("sales_per_litters", inplace=True, ascending=False)
df_pred[['county','sales_per_litters','population','stores_per_area','sales_prediction_dollars']].head(4)
county | sales_per_litters | population | stores_per_area | sales_prediction_dollars | |
---|---|---|---|---|---|
50 | Johnson | 15.770199 | 146547 | 77.091908 | 11867678.0 |
79 | Scott | 15.274967 | 172474 | 107.637548 | 14997255.0 |
28 | Des Moines | 15.174117 | 39739 | 34.976332 | 3968601.0 |
24 | Dallas | 15.116572 | 84516 | 12.321483 | 4125161.0 |
We see that Johnson has more high-end stores. We would recommend it if the goal of the the owner is to build new high-end stores. If the plan is to open more stores but with cheaper products, Johnson is not the place to choose.
The less saturated market is Decatur. But as discussed before this information does not provide have a unique recommendation and a more thorough analysis is needed.
df_pred.sort_values("stores_per_area", inplace=True, ascending=True)
df_pred[['county','sales_per_litters','population','stores_per_area','sales_prediction_dollars']]
county | sales_per_litters | population | stores_per_area | sales_prediction_dollars | |
---|---|---|---|---|---|
26 | Decatur | 14.405203 | 8141 | 1.381434 | 141811.0 |
52 | Keokuk | 11.481359 | 10119 | 1.782675 | 32440.0 |
11 | Butler | 11.140858 | 14791 | 1.929501 | 120103.0 |
39 | Hancock | 10.906219 | 10835 | 2.159491 | 58912.0 |
12 | Calhoun | 11.491468 | 9846 | 2.275235 | 68571.0 |
18 | Chickasaw | 11.530256 | 12023 | 3.024458 | 242653.0 |
70 | Osceola | 12.745341 | 6064 | 3.081288 | 15017.0 |
66 | Monroe | 13.831367 | 7870 | 3.122350 | 138773.0 |
74 | Pocahontas | 11.572587 | 6886 | 3.211730 | 166156.0 |
63 | Mills | 12.719923 | 14972 | 3.249919 | 142072.0 |
43 | Howard | 12.414609 | 9332 | 3.345996 | 335431.0 |
57 | Lucas | 13.732379 | 8647 | 3.520855 | 388043.0 |
96 | Wright | 12.707313 | 12779 | 3.540219 | 365167.0 |
83 | Tama | 13.161979 | 17319 | 3.665790 | 655898.0 |
44 | Humboldt | 13.022925 | 9487 | 3.746353 | 175430.0 |
35 | Greene | 14.057818 | 9011 | 3.787770 | 255811.0 |
56 | Louisa | 13.987210 | 11142 | 3.802603 | 378171.0 |
0 | Adair | 12.631486 | 7092 | 3.856389 | 221482.0 |
36 | Grundy | 11.490660 | 12313 | 3.953866 | 4175.0 |
80 | Shelby | 13.687852 | 11800 | 4.331137 | 447766.0 |
32 | Fayette | 12.617454 | 20054 | 4.463878 | 955344.0 |
34 | Franklin | 12.705219 | 10170 | 4.494475 | 475800.0 |
45 | Ida | 14.260037 | 6985 | 4.498609 | 286374.0 |
27 | Delaware | 13.760907 | 17327 | 4.507389 | 615781.0 |
41 | Harrison | 13.162064 | 14149 | 4.563865 | 643413.0 |
5 | Benton | 12.189062 | 25699 | 4.865202 | 912250.0 |
81 | Sioux | 13.538806 | 34898 | 5.012698 | 1725739.0 |
21 | Clayton | 13.195825 | 17590 | 5.023060 | 954141.0 |
78 | Sac | 12.323453 | 9876 | 5.066547 | 528002.0 |
23 | Crawford | 13.822321 | 16940 | 5.215863 | 960452.0 |
2 | Allamakee | 12.613783 | 13884 | 5.238733 | 839656.0 |
49 | Jefferson | 14.068272 | 18090 | 5.247219 | 632088.0 |
17 | Cherokee | 13.059553 | 11508 | 5.273883 | 501313.0 |
31 | Emmet | 12.847230 | 9658 | 5.368912 | 391058.0 |
72 | Palo Alto | 12.884834 | 9047 | 5.430014 | 797432.0 |
59 | Madison | 13.323817 | 15848 | 5.529360 | 523595.0 |
14 | Cass | 13.252889 | 13157 | 5.622151 | 931866.0 |
73 | Plymouth | 13.826506 | 25200 | 5.830208 | 953171.0 |
19 | Clarke | 13.696300 | 9309 | 5.866678 | 445791.0 |
67 | Montgomery | 13.382034 | 10225 | 5.886740 | 699564.0 |
92 | Winnebago | 12.001155 | 10631 | 5.918678 | 693928.0 |
3 | Appanoose | 13.375589 | 12462 | 5.961273 | 955846.0 |
38 | Hamilton | 12.940146 | 15076 | 6.035600 | 654023.0 |
65 | Monona | 13.660539 | 8898 | 6.049634 | 1017900.0 |
47 | Jackson | 12.838469 | 19472 | 6.067410 | 1120072.0 |
53 | Kossuth | 13.700965 | 15114 | 6.318563 | 834826.0 |
58 | Lyon | 13.718707 | 11754 | 6.344742 | 962622.0 |
42 | Henry | 14.271205 | 19773 | 6.566097 | 959348.0 |
33 | Floyd | 13.846056 | 15873 | 6.601745 | 913846.0 |
15 | Cedar | 12.724413 | 18454 | 6.621344 | 748698.0 |
93 | Winneshiek | 13.295288 | 20561 | 6.640300 | 1028921.0 |
85 | Union | 13.231373 | 12420 | 6.979412 | 1018948.0 |
89 | Washington | 14.579561 | 22281 | 7.047676 | 1139096.0 |
64 | Mitchell | 12.110916 | 10763 | 7.569967 | 819843.0 |
71 | Page | 14.035741 | 15391 | 7.618095 | 1134280.0 |
46 | Iowa | 12.607737 | 16311 | 7.708391 | 789613.0 |
40 | Hardin | 13.009862 | 17226 | 7.769505 | 1140430.0 |
60 | Mahaska | 13.342219 | 22181 | 7.949059 | 1255159.0 |
13 | Carroll | 13.476729 | 20437 | 8.136696 | 1200276.0 |
69 | O'Brien | 12.506521 | 14020 | 8.611285 | 1166399.0 |
9 | Buchanan | 13.504469 | 20992 | 8.803275 | 1269271.0 |
51 | Jones | 13.074420 | 20439 | 9.691983 | 1487368.0 |
76 | Poweshiek | 14.246155 | 18533 | 11.072879 | 1469057.0 |
20 | Clay | 13.073807 | 16333 | 11.702255 | 1348963.0 |
88 | Warren | 13.217243 | 49691 | 12.129877 | 2521533.0 |
24 | Dallas | 15.116572 | 84516 | 12.321483 | 4125161.0 |
10 | Buena Vista | 14.543544 | 20332 | 13.155325 | 2087010.0 |
91 | Webster | 14.315835 | 36769 | 13.442294 | 2541044.0 |
61 | Marion | 14.093066 | 33189 | 13.708075 | 2177923.0 |
48 | Jasper | 12.996996 | 36708 | 13.740373 | 2486830.0 |
7 | Boone | 13.277107 | 26532 | 14.083864 | 1696331.0 |
8 | Bremer | 13.042060 | 24798 | 14.144601 | 1657708.0 |
62 | Marshall | 14.875434 | 40312 | 15.464035 | 2791982.0 |
22 | Clinton | 13.627250 | 47309 | 16.674016 | 3197853.0 |
68 | Muscatine | 13.923365 | 42940 | 18.572652 | 3138265.0 |
54 | Lee | 14.395224 | 34615 | 22.574731 | 3104487.0 |
87 | Wapello | 14.284163 | 34982 | 25.200415 | 3190720.0 |
94 | Woodbury | 15.012023 | 102779 | 26.139797 | 7104739.0 |
29 | Dickinson | 14.023690 | 17243 | 27.448097 | 2628643.0 |
75 | Pottawattamie | 14.721285 | 93582 | 31.465442 | 6828297.0 |
16 | Cerro Gordo | 13.808162 | 43070 | 34.428514 | 4328297.0 |
28 | Des Moines | 15.174117 | 39739 | 34.976332 | 3968601.0 |
82 | Story | 14.747469 | 97090 | 36.855670 | 7146093.0 |
30 | Dubuque | 14.393144 | 97003 | 42.015240 | 7310597.0 |
6 | Black Hawk | 15.024919 | 132904 | 71.288290 | 11283189.0 |
50 | Johnson | 15.770199 | 146547 | 77.091908 | 11867678.0 |
55 | Linn | 14.794975 | 221661 | 85.036195 | 16780528.0 |
79 | Scott | 15.274967 | 172474 | 107.637548 | 14997255.0 |
The county with weaker competition is Butler. This could provided untapped potential. However, the absence of a reasonable number of stores may indicate, as observed before, that the county's population is simply not interested in this category of product. Again, further investigation must be carried out.
df_pred.sort_values("store_population_ratio", inplace=True, ascending=True)
df_pred[['county','sales_per_litters','population','stores_per_area','sales_prediction_dollars','store_population_ratio']]
county | sales_per_litters | population | stores_per_area | sales_prediction_dollars | store_population_ratio | |
---|---|---|---|---|---|---|
11 | Butler | 11.140858 | 14791 | 1.929501 | 120103.0 | 0.223650 |
26 | Decatur | 14.405203 | 8141 | 1.381434 | 141811.0 | 0.244933 |
24 | Dallas | 15.116572 | 84516 | 12.321483 | 4125161.0 | 0.251834 |
63 | Mills | 12.719923 | 14972 | 3.249919 | 142072.0 | 0.265028 |
52 | Keokuk | 11.481359 | 10119 | 1.782675 | 32440.0 | 0.276411 |
39 | Hancock | 10.906219 | 10835 | 2.159491 | 58912.0 | 0.286202 |
81 | Sioux | 13.538806 | 34898 | 5.012698 | 1725739.0 | 0.295346 |
5 | Benton | 12.189062 | 25699 | 4.865202 | 912250.0 | 0.300829 |
27 | Delaware | 13.760907 | 17327 | 4.507389 | 615781.0 | 0.331794 |
56 | Louisa | 13.987210 | 11142 | 3.802603 | 378171.0 | 0.334769 |
18 | Chickasaw | 11.530256 | 12023 | 3.024458 | 242653.0 | 0.335274 |
36 | Grundy | 11.490660 | 12313 | 3.953866 | 4175.0 | 0.352554 |
49 | Jefferson | 14.068272 | 18090 | 5.247219 | 632088.0 | 0.356329 |
12 | Calhoun | 11.491468 | 9846 | 2.275235 | 68571.0 | 0.368170 |
66 | Monroe | 13.831367 | 7870 | 3.122350 | 138773.0 | 0.381194 |
88 | Warren | 13.217243 | 49691 | 12.129877 | 2521533.0 | 0.394840 |
96 | Wright | 12.707313 | 12779 | 3.540219 | 365167.0 | 0.427342 |
32 | Fayette | 12.617454 | 20054 | 4.463878 | 955344.0 | 0.435325 |
60 | Mahaska | 13.342219 | 22181 | 7.949059 | 1255159.0 | 0.435959 |
83 | Tama | 13.161979 | 17319 | 3.665790 | 655898.0 | 0.442000 |
15 | Cedar | 12.724413 | 18454 | 6.621344 | 748698.0 | 0.442181 |
42 | Henry | 14.271205 | 19773 | 6.566097 | 959348.0 | 0.456481 |
57 | Lucas | 13.732379 | 8647 | 3.520855 | 388043.0 | 0.469990 |
80 | Shelby | 13.687852 | 11800 | 4.331137 | 447766.0 | 0.476864 |
70 | Osceola | 12.745341 | 6064 | 3.081288 | 15017.0 | 0.491590 |
59 | Madison | 13.323817 | 15848 | 5.529360 | 523595.0 | 0.491860 |
93 | Winneshiek | 13.295288 | 20561 | 6.640300 | 1028921.0 | 0.512670 |
33 | Floyd | 13.846056 | 15873 | 6.601745 | 913846.0 | 0.518050 |
73 | Plymouth | 13.826506 | 25200 | 5.830208 | 953171.0 | 0.522857 |
44 | Humboldt | 13.022925 | 9487 | 3.746353 | 175430.0 | 0.523559 |
89 | Washington | 14.579561 | 22281 | 7.047676 | 1139096.0 | 0.527131 |
23 | Crawford | 13.822321 | 16940 | 5.215863 | 960452.0 | 0.552538 |
43 | Howard | 12.414609 | 9332 | 3.345996 | 335431.0 | 0.556901 |
38 | Hamilton | 12.940146 | 15076 | 6.035600 | 654023.0 | 0.570642 |
35 | Greene | 14.057818 | 9011 | 3.787770 | 255811.0 | 0.582399 |
62 | Marshall | 14.875434 | 40312 | 15.464035 | 2791982.0 | 0.583127 |
34 | Franklin | 12.705219 | 10170 | 4.494475 | 475800.0 | 0.595084 |
21 | Clayton | 13.195825 | 17590 | 5.023060 | 954141.0 | 0.595793 |
22 | Clinton | 13.627250 | 47309 | 16.674016 | 3197853.0 | 0.599548 |
19 | Clarke | 13.696300 | 9309 | 5.866678 | 445791.0 | 0.601568 |
9 | Buchanan | 13.504469 | 20992 | 8.803275 | 1269271.0 | 0.602753 |
74 | Pocahontas | 11.572587 | 6886 | 3.211730 | 166156.0 | 0.605141 |
2 | Allamakee | 12.613783 | 13884 | 5.238733 | 839656.0 | 0.619058 |
7 | Boone | 13.277107 | 26532 | 14.083864 | 1696331.0 | 0.622456 |
0 | Adair | 12.631486 | 7092 | 3.856389 | 221482.0 | 0.623237 |
30 | Dubuque | 14.393144 | 97003 | 42.015240 | 7310597.0 | 0.626300 |
48 | Jasper | 12.996996 | 36708 | 13.740373 | 2486830.0 | 0.627820 |
31 | Emmet | 12.847230 | 9658 | 5.368912 | 391058.0 | 0.628598 |
61 | Marion | 14.093066 | 33189 | 13.708075 | 2177923.0 | 0.631474 |
17 | Cherokee | 13.059553 | 11508 | 5.273883 | 501313.0 | 0.643726 |
41 | Harrison | 13.162064 | 14149 | 4.563865 | 643413.0 | 0.645346 |
47 | Jackson | 12.838469 | 19472 | 6.067410 | 1120072.0 | 0.646724 |
91 | Webster | 14.315835 | 36769 | 13.442294 | 2541044.0 | 0.655607 |
67 | Montgomery | 13.382034 | 10225 | 5.886740 | 699564.0 | 0.656528 |
94 | Woodbury | 15.012023 | 102779 | 26.139797 | 7104739.0 | 0.659006 |
45 | Ida | 14.260037 | 6985 | 4.498609 | 286374.0 | 0.683894 |
46 | Iowa | 12.607737 | 16311 | 7.708391 | 789613.0 | 0.684323 |
3 | Appanoose | 13.375589 | 12462 | 5.961273 | 955846.0 | 0.688654 |
71 | Page | 14.035741 | 15391 | 7.618095 | 1134280.0 | 0.692028 |
13 | Carroll | 13.476729 | 20437 | 8.136696 | 1200276.0 | 0.712825 |
8 | Bremer | 13.042060 | 24798 | 14.144601 | 1657708.0 | 0.715622 |
92 | Winnebago | 12.001155 | 10631 | 5.918678 | 693928.0 | 0.721193 |
50 | Johnson | 15.770199 | 146547 | 77.091908 | 11867678.0 | 0.723010 |
78 | Sac | 12.323453 | 9876 | 5.066547 | 528002.0 | 0.724686 |
85 | Union | 13.231373 | 12420 | 6.979412 | 1018948.0 | 0.726570 |
68 | Muscatine | 13.923365 | 42940 | 18.572652 | 3138265.0 | 0.735701 |
82 | Story | 14.747469 | 97090 | 36.855670 | 7146093.0 | 0.737831 |
79 | Scott | 15.274967 | 172474 | 107.637548 | 14997255.0 | 0.756166 |
14 | Cass | 13.252889 | 13157 | 5.622151 | 931866.0 | 0.759140 |
51 | Jones | 13.074420 | 20439 | 9.691983 | 1487368.0 | 0.760556 |
64 | Mitchell | 12.110916 | 10763 | 7.569967 | 819843.0 | 0.764378 |
75 | Pottawattamie | 14.721285 | 93582 | 31.465442 | 6828297.0 | 0.774422 |
54 | Lee | 14.395224 | 34615 | 22.574731 | 3104487.0 | 0.787549 |
40 | Hardin | 13.009862 | 17226 | 7.769505 | 1140430.0 | 0.789040 |
87 | Wapello | 14.284163 | 34982 | 25.200415 | 3190720.0 | 0.795924 |
28 | Des Moines | 15.174117 | 39739 | 34.976332 | 3968601.0 | 0.817182 |
58 | Lyon | 13.718707 | 11754 | 6.344742 | 962622.0 | 0.820742 |
55 | Linn | 14.794975 | 221661 | 85.036195 | 16780528.0 | 0.830511 |
6 | Black Hawk | 15.024919 | 132904 | 71.288290 | 11283189.0 | 0.889582 |
53 | Kossuth | 13.700965 | 15114 | 6.318563 | 834826.0 | 0.907768 |
20 | Clay | 13.073807 | 16333 | 11.702255 | 1348963.0 | 0.935284 |
76 | Poweshiek | 14.246155 | 18533 | 11.072879 | 1469057.0 | 0.944046 |
72 | Palo Alto | 12.884834 | 9047 | 5.430014 | 797432.0 | 0.957997 |
69 | O'Brien | 12.506521 | 14020 | 8.611285 | 1166399.0 | 0.995292 |
10 | Buena Vista | 14.543544 | 20332 | 13.155325 | 2087010.0 | 1.062266 |
65 | Monona | 13.660539 | 8898 | 6.049634 | 1017900.0 | 1.126433 |
16 | Cerro Gordo | 13.808162 | 43070 | 34.428514 | 4328297.0 | 1.197260 |
29 | Dickinson | 14.023690 | 17243 | 27.448097 | 2628643.0 | 1.630227 |
[1] https://jocelyn-ong.github.io/iowa-liquor-sales
[2] https://medium.com/@neil.liberman/where-to-open-a-liquor-store-in-iowa-bb85af614563
[3] https://www.youtube.com/watch?v=PL_GBlcBNCk&t=725s)
[5] https://www.iowa-demographics.com/counties_by_population
[6] https://github.com/mwaskom/seaborn/issues/1126
[7] Karamshuk et al: Geo-spotting: mining online location-based services for optimal retail store placement
Consider the following two columns of the original dataset:
df_raw[['State Bottle Cost','State Bottle Retail']].head()
State Bottle Cost | State Bottle Retail | |
---|---|---|
0 | $18.09 | $27.14 |
1 | $18.09 | $27.14 |
2 | $6.40 | $9.60 |
3 | $35.55 | $53.34 |
4 | $6.40 | $9.60 |
The state of Iowa is a wholesale seller to its stores. The cost above is the amount paid per ordered bottle by the Alcoholic Beverages Division and the retail prices are actually the prices stores paid to the State. In the dataset there is no information regarding the end user (the consumer). Following previous works we will use the difference between these columns as a proxy for profit. This can be done approximately if one assumes e.g. that the profit margin is essentially flat and we then set the retail price as end user price. Also, we must assume that the delay between the purchase from the State and the sale to the end user is not too large.