Python is an amazing language and here we show a comprehensive tutorial in it for usage in Data Science.
I can also write this text within Jupyter by changing Cell type to Markdown in dropdown. That's what I just did. For markdown changing size of font is easy by prefixing by #, or ## , or ### (more the number of # smaller the size of font) while for a non numbered list prefix by a -
Installation is done using pip or easy_install(from setup tools) . Here we show how to install Pandas package from the Jupyter Notebook itself. I use the --upgrade flag to upgrade it, and I install Bokeh using easy_tools. Pandas is the Python library for Data Analysis and Bokeh helps make interactive data analysis available. Note the ! sign before the sudo command- it helps me use the Terminal without leaving the comfort of my Jupyter Notebook
! sudo pip install pandas --upgrade
The directory '/home/ajay/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. You are using pip version 7.1.0, however version 7.1.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command. The directory '/home/ajay/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. Collecting pandas Downloading pandas-0.17.1.tar.gz (6.7MB) 100% |████████████████████████████████| 6.7MB 40kB/s Collecting python-dateutil (from pandas) Downloading python_dateutil-2.4.2-py2.py3-none-any.whl (188kB) 100% |████████████████████████████████| 192kB 1.3MB/s Collecting pytz>=2011k (from pandas) Downloading pytz-2015.7-py2.py3-none-any.whl (476kB) 100% |████████████████████████████████| 479kB 92kB/s Collecting numpy>=1.7.0 (from pandas) Downloading numpy-1.10.1.tar.gz (4.0MB) 100% |████████████████████████████████| 4.1MB 75kB/s Collecting six>=1.5 (from python-dateutil->pandas) Downloading six-1.10.0-py2.py3-none-any.whl Installing collected packages: six, python-dateutil, pytz, numpy, pandas Found existing installation: six 1.9.0 Uninstalling six-1.9.0: Successfully uninstalled six-1.9.0 Found existing installation: python-dateutil 1.5 Uninstalling python-dateutil-1.5: Successfully uninstalled python-dateutil-1.5 Found existing installation: pytz 2015.2 Uninstalling pytz-2015.2: Successfully uninstalled pytz-2015.2 Found existing installation: numpy 1.9.2 Uninstalling numpy-1.9.2: Successfully uninstalled numpy-1.9.2 Running setup.py install for numpy Found existing installation: pandas 0.16.0 Uninstalling pandas-0.16.0: Successfully uninstalled pandas-0.16.0 Running setup.py install for pandas Successfully installed numpy-1.10.1 pandas-0.17.1 python-dateutil-2.4.2 pytz-2015.7 six-1.10.0
! sudo easy_install bokeh
Searching for bokeh Reading https://pypi.python.org/simple/bokeh/ Best match: bokeh 0.10.0 Downloading https://pypi.python.org/packages/source/b/bokeh/bokeh-0.10.0.zip#md5=1432ed7d3034ce0c16c9f3c6388ad10d Processing bokeh-0.10.0.zip Writing /tmp/easy_install-CSs4Vk/bokeh-0.10.0/setup.cfg Running bokeh-0.10.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-CSs4Vk/bokeh-0.10.0/egg-dist-tmp-45CSzc package init file 'bokeh/models/tests/__init__.py' not found (or not a regular file) package init file 'bokeh/charts/builder/tests/__init__.py' not found (or not a regular file) package init file 'bokeh/charts/tests/__init__.py' not found (or not a regular file) package init file 'bokeh/_legacy_charts/tests/__init__.py' not found (or not a regular file) package init file 'bokeh/server/tests/__init__.py' not found (or not a regular file) package init file 'bokeh/tests/__init__.py' not found (or not a regular file) package init file 'bokeh/util/tests/__init__.py' not found (or not a regular file) creating /usr/local/lib/python2.7/dist-packages/bokeh-0.10.0-py2.7.egg Extracting bokeh-0.10.0-py2.7.egg to /usr/local/lib/python2.7/dist-packages Adding bokeh 0.10.0 to easy-install.pth file Installing bokeh-server script to /usr/local/bin Installing websocket_worker.py script to /usr/local/bin Installed /usr/local/lib/python2.7/dist-packages/bokeh-0.10.0-py2.7.egg Processing dependencies for bokeh Finished processing dependencies for bokeh
You can load a Python Package using the following ways
You can then invoke the function using
PACKAGE.FUN , PK.FUN and FUN respectively
from datetime import datetime
Starttime =datetime.now()
Starttime
datetime.datetime(2015, 12, 2, 22, 30, 1, 850119)
import pandas as pd
Let's import some datasets. We will use Datasets bundled with R language from https://vincentarelbundock.github.io/Rdatasets/datasets.html
diamonds =pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/ggplot2/diamonds.csv")
diamonds.columns #Single Line Comment starts with #
# name of variables is given by columns. In R we would use the command names(object)
# Note also R uses the FUNCTION(OBJECTNAME) syntax while Python uses OBJECTNAME.FUNCTION
Index(['Unnamed: 0', 'carat', 'cut', 'color', 'clarity', 'depth', 'table', 'price', 'x', 'y', 'z'], dtype='object')
len(diamonds) #gives the number of rows
53940
0.0001*len(diamonds)
5.394
round(0.0001*len(diamonds))
5
'''Lets get some information on the object.
In R we would get this by str command (for structure).
In Python str turns the object to string
This was a multiple line comment using three single quote marks
'''
diamonds.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 53940 entries, 0 to 53939 Data columns (total 11 columns): Unnamed: 0 53940 non-null int64 carat 53940 non-null float64 cut 53940 non-null object color 53940 non-null object clarity 53940 non-null object depth 53940 non-null float64 table 53940 non-null float64 price 53940 non-null int64 x 53940 non-null float64 y 53940 non-null float64 z 53940 non-null float64 dtypes: float64(6), int64(2), object(3) memory usage: 4.3+ MB
diamonds.head(10) #we check the first 10 rows in the dataset
Unnamed: 0 | carat | cut | color | clarity | depth | table | price | x | y | z | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
1 | 2 | 0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
2 | 3 | 0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
3 | 4 | 0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.20 | 4.23 | 2.63 |
4 | 5 | 0.31 | Good | J | SI2 | 63.3 | 58 | 335 | 4.34 | 4.35 | 2.75 |
5 | 6 | 0.24 | Very Good | J | VVS2 | 62.8 | 57 | 336 | 3.94 | 3.96 | 2.48 |
6 | 7 | 0.24 | Very Good | I | VVS1 | 62.3 | 57 | 336 | 3.95 | 3.98 | 2.47 |
7 | 8 | 0.26 | Very Good | H | SI1 | 61.9 | 55 | 337 | 4.07 | 4.11 | 2.53 |
8 | 9 | 0.22 | Fair | E | VS2 | 65.1 | 61 | 337 | 3.87 | 3.78 | 2.49 |
9 | 10 | 0.23 | Very Good | H | VS1 | 59.4 | 61 | 338 | 4.00 | 4.05 | 2.39 |
diamonds.ix[20:30]
Unnamed: 0 | carat | cut | color | clarity | depth | table | price | x | y | z | |
---|---|---|---|---|---|---|---|---|---|---|---|
20 | 21 | 0.30 | Good | I | SI2 | 63.3 | 56 | 351 | 4.26 | 4.30 | 2.71 |
21 | 22 | 0.23 | Very Good | E | VS2 | 63.8 | 55 | 352 | 3.85 | 3.92 | 2.48 |
22 | 23 | 0.23 | Very Good | H | VS1 | 61.0 | 57 | 353 | 3.94 | 3.96 | 2.41 |
23 | 24 | 0.31 | Very Good | J | SI1 | 59.4 | 62 | 353 | 4.39 | 4.43 | 2.62 |
24 | 25 | 0.31 | Very Good | J | SI1 | 58.1 | 62 | 353 | 4.44 | 4.47 | 2.59 |
25 | 26 | 0.23 | Very Good | G | VVS2 | 60.4 | 58 | 354 | 3.97 | 4.01 | 2.41 |
26 | 27 | 0.24 | Premium | I | VS1 | 62.5 | 57 | 355 | 3.97 | 3.94 | 2.47 |
27 | 28 | 0.30 | Very Good | J | VS2 | 62.2 | 57 | 357 | 4.28 | 4.30 | 2.67 |
28 | 29 | 0.23 | Very Good | D | VS2 | 60.5 | 61 | 357 | 3.96 | 3.97 | 2.40 |
29 | 30 | 0.23 | Very Good | F | VS1 | 60.9 | 57 | 357 | 3.96 | 3.99 | 2.42 |
30 | 31 | 0.23 | Very Good | F | VS1 | 60.0 | 57 | 402 | 4.00 | 4.03 | 2.41 |
#To refer to a particular column I use it's name
# I can also chain the commands
diamonds.ix[20:25].cut
20 Good 21 Very Good 22 Very Good 23 Very Good 24 Very Good 25 Very Good Name: cut, dtype: object
diamonds.ix[20:25]["color"]
20 I 21 E 22 H 23 J 24 J 25 G Name: color, dtype: object
import numpy as np
rows = np.random.choice(diamonds.index.values, round(0.0001*len(diamonds)))
print(rows)
[42122 21399 40554 36399 50336]
diamonds.ix[rows]
Unnamed: 0 | carat | cut | color | clarity | depth | table | price | x | y | z | |
---|---|---|---|---|---|---|---|---|---|---|---|
42122 | 42123 | 0.58 | Ideal | I | VS2 | 62.0 | 54 | 1279 | 5.35 | 5.39 | 3.33 |
21399 | 21400 | 1.51 | Ideal | E | SI2 | 62.9 | 57 | 9513 | 7.29 | 7.23 | 4.57 |
40554 | 40555 | 0.41 | Ideal | G | VVS1 | 61.7 | 55 | 1151 | 4.77 | 4.79 | 2.95 |
36399 | 36400 | 0.31 | Ideal | E | VS1 | 61.8 | 56 | 942 | 4.37 | 4.34 | 2.69 |
50336 | 50337 | 0.70 | Good | D | SI2 | 58.3 | 60 | 2242 | 5.81 | 5.89 | 3.41 |
##Mising Values
diamonds= diamonds.dropna(how='any')
We now do summaries for numerical and categorical data.
diamonds.describe()
Unnamed: 0 | carat | depth | table | price | x | y | z | |
---|---|---|---|---|---|---|---|---|
count | 53940.000000 | 53940.000000 | 53940.000000 | 53940.000000 | 53940.000000 | 53940.000000 | 53940.000000 | 53940.000000 |
mean | 26970.500000 | 0.797940 | 61.749405 | 57.457184 | 3932.799722 | 5.731157 | 5.734526 | 3.538734 |
std | 15571.281097 | 0.474011 | 1.432621 | 2.234491 | 3989.439738 | 1.121761 | 1.142135 | 0.705699 |
min | 1.000000 | 0.200000 | 43.000000 | 43.000000 | 326.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 13485.750000 | 0.400000 | 61.000000 | 56.000000 | 950.000000 | 4.710000 | 4.720000 | 2.910000 |
50% | 26970.500000 | 0.700000 | 61.800000 | 57.000000 | 2401.000000 | 5.700000 | 5.710000 | 3.530000 |
75% | 40455.250000 | 1.040000 | 62.500000 | 59.000000 | 5324.250000 | 6.540000 | 6.540000 | 4.040000 |
max | 53940.000000 | 5.010000 | 79.000000 | 95.000000 | 18823.000000 | 10.740000 | 58.900000 | 31.800000 |
diamonds.price.describe()
count 53940.000000 mean 3932.799722 std 3989.439738 min 326.000000 25% 950.000000 50% 2401.000000 75% 5324.250000 max 18823.000000 Name: price, dtype: float64
diamonds.corr() #Numerical Corelations
Unnamed: 0 | carat | depth | table | price | x | y | z | |
---|---|---|---|---|---|---|---|---|
Unnamed: 0 | 1.000000 | -0.377983 | -0.034800 | -0.100830 | -0.306873 | -0.405440 | -0.395843 | -0.399208 |
carat | -0.377983 | 1.000000 | 0.028224 | 0.181618 | 0.921591 | 0.975094 | 0.951722 | 0.953387 |
depth | -0.034800 | 0.028224 | 1.000000 | -0.295779 | -0.010647 | -0.025289 | -0.029341 | 0.094924 |
table | -0.100830 | 0.181618 | -0.295779 | 1.000000 | 0.127134 | 0.195344 | 0.183760 | 0.150929 |
price | -0.306873 | 0.921591 | -0.010647 | 0.127134 | 1.000000 | 0.884435 | 0.865421 | 0.861249 |
x | -0.405440 | 0.975094 | -0.025289 | 0.195344 | 0.884435 | 1.000000 | 0.974701 | 0.970772 |
y | -0.395843 | 0.951722 | -0.029341 | 0.183760 | 0.865421 | 0.974701 | 1.000000 | 0.952006 |
z | -0.399208 | 0.953387 | 0.094924 | 0.150929 | 0.861249 | 0.970772 | 0.952006 | 1.000000 |
diamonds.corr()>0.5
Unnamed: 0 | carat | depth | table | price | x | y | z | |
---|---|---|---|---|---|---|---|---|
Unnamed: 0 | True | False | False | False | False | False | False | False |
carat | False | True | False | False | True | True | True | True |
depth | False | False | True | False | False | False | False | False |
table | False | False | False | True | False | False | False | False |
price | False | True | False | False | True | True | True | True |
x | False | True | False | False | True | True | True | True |
y | False | True | False | False | True | True | True | True |
z | False | True | False | False | True | True | True | True |
diamonds['cut'].unique() #To get unique values
array(['Ideal', 'Premium', 'Good', 'Very Good', 'Fair'], dtype=object)
diamonds['clarity'].unique()
array(['SI2', 'SI1', 'VS1', 'VS2', 'VVS2', 'VVS1', 'I1', 'IF'], dtype=object)
pd.value_counts(diamonds.cut)
Ideal 21551 Premium 13791 Very Good 12082 Good 4906 Fair 1610 dtype: int64
pd.value_counts(diamonds.color)
G 11292 E 9797 F 9542 H 8304 D 6775 I 5422 J 2808 dtype: int64
pd.crosstab(diamonds.cut,diamonds.color)
color | D | E | F | G | H | I | J |
---|---|---|---|---|---|---|---|
cut | |||||||
Fair | 163 | 224 | 312 | 314 | 303 | 175 | 119 |
Good | 662 | 933 | 909 | 871 | 702 | 522 | 307 |
Ideal | 2834 | 3903 | 3826 | 4884 | 3115 | 2093 | 896 |
Premium | 1603 | 2337 | 2331 | 2924 | 2360 | 1428 | 808 |
Very Good | 1513 | 2400 | 2164 | 2299 | 1824 | 1204 | 678 |
pd.crosstab(diamonds.cut,diamonds.color,margins='TRUE')
color | D | E | F | G | H | I | J | All |
---|---|---|---|---|---|---|---|---|
cut | ||||||||
Fair | 163 | 224 | 312 | 314 | 303 | 175 | 119 | 1610 |
Good | 662 | 933 | 909 | 871 | 702 | 522 | 307 | 4906 |
Ideal | 2834 | 3903 | 3826 | 4884 | 3115 | 2093 | 896 | 21551 |
Premium | 1603 | 2337 | 2331 | 2924 | 2360 | 1428 | 808 | 13791 |
Very Good | 1513 | 2400 | 2164 | 2299 | 1824 | 1204 | 678 | 12082 |
All | 6775 | 9797 | 9542 | 11292 | 8304 | 5422 | 2808 | 53940 |
pd.crosstab(diamonds.cut,diamonds.color,margins='TRUE')
color | D | E | F | G | H | I | J | All |
---|---|---|---|---|---|---|---|---|
cut | ||||||||
Fair | 163 | 224 | 312 | 314 | 303 | 175 | 119 | 1610 |
Good | 662 | 933 | 909 | 871 | 702 | 522 | 307 | 4906 |
Ideal | 2834 | 3903 | 3826 | 4884 | 3115 | 2093 | 896 | 21551 |
Premium | 1603 | 2337 | 2331 | 2924 | 2360 | 1428 | 808 | 13791 |
Very Good | 1513 | 2400 | 2164 | 2299 | 1824 | 1204 | 678 | 12082 |
All | 6775 | 9797 | 9542 | 11292 | 8304 | 5422 | 2808 | 53940 |
cutgroup=pd.groupby(diamonds,diamonds.cut)
cutgroup
<pandas.core.groupby.DataFrameGroupBy object at 0xae00d54c>
cutgroup.price.median()
cut Fair 3282.0 Good 3050.5 Ideal 1810.0 Premium 3185.0 Very Good 2648.0 Name: price, dtype: float64
cutgroup.price.median().reset_index()
cut | price | |
---|---|---|
0 | Fair | 3282.0 |
1 | Good | 3050.5 |
2 | Ideal | 1810.0 |
3 | Premium | 3185.0 |
4 | Very Good | 2648.0 |
d=cutgroup.price.median().reset_index()
d.transpose()
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
cut | Fair | Good | Ideal | Premium | Very Good |
price | 3282 | 3050.5 | 1810 | 3185 | 2648 |
diamonds.groupby(['cut', "color"])
<pandas.core.groupby.DataFrameGroupBy object at 0xad6845ac>
diamonds.groupby(['cut', "color"]).price.median().reset_index()
cut | color | price | |
---|---|---|---|
0 | Fair | D | 3730.0 |
1 | Fair | E | 2956.0 |
2 | Fair | F | 3035.0 |
3 | Fair | G | 3057.0 |
4 | Fair | H | 3816.0 |
5 | Fair | I | 3246.0 |
6 | Fair | J | 3302.0 |
7 | Good | D | 2728.5 |
8 | Good | E | 2420.0 |
9 | Good | F | 2647.0 |
10 | Good | G | 3340.0 |
11 | Good | H | 3468.5 |
12 | Good | I | 3639.5 |
13 | Good | J | 3733.0 |
14 | Ideal | D | 1576.0 |
15 | Ideal | E | 1437.0 |
16 | Ideal | F | 1775.0 |
17 | Ideal | G | 1857.5 |
18 | Ideal | H | 2278.0 |
19 | Ideal | I | 2659.0 |
20 | Ideal | J | 4096.0 |
21 | Premium | D | 2009.0 |
22 | Premium | E | 1928.0 |
23 | Premium | F | 2841.0 |
24 | Premium | G | 2745.0 |
25 | Premium | H | 4511.0 |
26 | Premium | I | 4640.0 |
27 | Premium | J | 5063.0 |
28 | Very Good | D | 2310.0 |
29 | Very Good | E | 1989.5 |
30 | Very Good | F | 2471.0 |
31 | Very Good | G | 2437.0 |
32 | Very Good | H | 3734.0 |
33 | Very Good | I | 3888.0 |
34 | Very Good | J | 4113.0 |
e=diamonds.groupby(['cut', "color"]).price.median().reset_index()
e.pivot(index='cut', columns='color', values='price')
color | D | E | F | G | H | I | J |
---|---|---|---|---|---|---|---|
cut | |||||||
Fair | 3730.0 | 2956.0 | 3035 | 3057.0 | 3816.0 | 3246.0 | 3302 |
Good | 2728.5 | 2420.0 | 2647 | 3340.0 | 3468.5 | 3639.5 | 3733 |
Ideal | 1576.0 | 1437.0 | 1775 | 1857.5 | 2278.0 | 2659.0 | 4096 |
Premium | 2009.0 | 1928.0 | 2841 | 2745.0 | 4511.0 | 4640.0 | 5063 |
Very Good | 2310.0 | 1989.5 | 2471 | 2437.0 | 3734.0 | 3888.0 | 4113 |
f=e.pivot(index='cut', columns='color', values='price')
f>4000
color | D | E | F | G | H | I | J |
---|---|---|---|---|---|---|---|
cut | |||||||
Fair | False | False | False | False | False | False | False |
Good | False | False | False | False | False | False | False |
Ideal | False | False | False | False | False | False | True |
Premium | False | False | False | False | True | True | True |
Very Good | False | False | False | False | False | False | True |
import matplotlib.pyplot as plt
%matplotlib inline
pd.options.display.mpl_style = 'default'
plt.style.use('ggplot')
!sudo pip install seaborn --upgrade
The directory '/home/ajay/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. You are using pip version 7.1.0, however version 7.1.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command. The directory '/home/ajay/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. Collecting seaborn Downloading seaborn-0.6.0.tar.gz (145kB) 100% |████████████████████████████████| 147kB 123kB/s Installing collected packages: seaborn Found existing installation: seaborn 0.5.1 Uninstalling seaborn-0.5.1: Successfully uninstalled seaborn-0.5.1 Running setup.py install for seaborn Successfully installed seaborn-0.6.0
diamonds['price'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0xa75290ec>
plt.hist(diamonds.price)
(array([ 25335., 9328., 7393., 3878., 2364., 1745., 1306., 1002., 863., 726.]), array([ 326. , 2175.7, 4025.4, 5875.1, 7724.8, 9574.5, 11424.2, 13273.9, 15123.6, 16973.3, 18823. ]), <a list of 10 Patch objects>)
plt.figure();
diamonds['price'].plot(kind='hist', stacked=True, bins=20)
<matplotlib.axes._subplots.AxesSubplot at 0x98f2bcac>
plt.boxplot(diamonds.price)
{'boxes': [<matplotlib.lines.Line2D at 0xa75082cc>], 'caps': [<matplotlib.lines.Line2D at 0xa750a22c>, <matplotlib.lines.Line2D at 0xa750abcc>], 'fliers': [<matplotlib.lines.Line2D at 0xa750ff2c>], 'means': [], 'medians': [<matplotlib.lines.Line2D at 0xa750f58c>], 'whiskers': [<matplotlib.lines.Line2D at 0xa7508eec>, <matplotlib.lines.Line2D at 0xa750986c>]}
plt.figure();
diamonds['price'].plot(kind='box')
<matplotlib.axes._subplots.AxesSubplot at 0x961d1a0c>
diamonds.plot(kind='hexbin', x='price', y='carat', gridsize=8)
<matplotlib.axes._subplots.AxesSubplot at 0x9552b92c>
from ggplot import *
p = ggplot(aes(x='price', y='carat',color="clarity"), data=diamonds)
p + geom_point()
<ggplot: (-918646034)>
p = ggplot(aes(x='price', y='carat',color="cut"), data=diamonds)
p + geom_point()
<ggplot: (-918646060)>
Lets do some basic Regression Modeling
import statsmodels.formula.api as sm
result = sm.ols(formula="price ~ carat + color", data=diamonds).fit()
result.summary()
Dep. Variable: | price | R-squared: | 0.864 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.864 |
Method: | Least Squares | F-statistic: | 4.893e+04 |
Date: | Wed, 02 Dec 2015 | Prob (F-statistic): | 0.00 |
Time: | 23:44:07 | Log-Likelihood: | -4.6998e+05 |
No. Observations: | 53940 | AIC: | 9.400e+05 |
Df Residuals: | 53932 | BIC: | 9.400e+05 |
Df Model: | 7 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | -2136.2289 | 20.122 | -106.162 | 0.000 | -2175.669 -2096.789 |
color[T.E] | -93.7813 | 23.252 | -4.033 | 0.000 | -139.355 -48.208 |
color[T.F] | -80.2629 | 23.405 | -3.429 | 0.001 | -126.136 -34.390 |
color[T.G] | -85.5363 | 22.670 | -3.773 | 0.000 | -129.969 -41.103 |
color[T.H] | -732.2418 | 24.354 | -30.067 | 0.000 | -779.975 -684.508 |
color[T.I] | -1055.7319 | 27.310 | -38.657 | 0.000 | -1109.260 -1002.203 |
color[T.J] | -1914.4722 | 33.777 | -56.679 | 0.000 | -1980.676 -1848.268 |
carat | 8066.6230 | 14.040 | 574.558 | 0.000 | 8039.105 8094.141 |
Omnibus: | 12266.990 | Durbin-Watson: | 0.948 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 165317.069 |
Skew: | 0.719 | Prob(JB): | 0.00 |
Kurtosis: | 11.455 | Cond. No. | 11.0 |
result.params
Intercept -2136.228853 color[T.E] -93.781288 color[T.F] -80.262858 color[T.G] -85.536282 color[T.H] -732.241826 color[T.I] -1055.731857 color[T.J] -1914.472203 carat 8066.623019 dtype: float64