The goal of this lesson is to introduce Python libraries, load and look at a dataset, begin some common statistical testing. It includes:
Most of the common statistical models (t-test, correlation, ANOVA, chi-square, etc.) are special cases of linear models, or a very close approximation. This beautiful simplicity means that there is less to learn. In particular, it all comes down to $y = a \cdot x + b$, where $a$ is the slope of the line and $b$ is the y-intercept where the line crosses the y-axis.
There are certain assumptions we check to use these "parametric tests." When assumptions are not met, we have alternative "non-parametric" counterparts. We will think of "non-parametric"" tests as ranked versions of the corresponding parametric tests.
This lesson is adapted from Tests as Linear, which is also available in R.
See the Cheat Sheet
In part 2, we will:
Think of these as useful powerful toolboxes we are opening up on our workbench. Many additional libraries are available from the Python Package Index.
we import
some key scientific libraries, sometimes using a shorter alias for ease of coding, with the syntax import library_name as alias_name
# [1]:
# Python libraries for data structures, arrays, and math
import numpy as np
import pandas as pd
# Libraris for plotting
import matplotlib.pyplot as plt
import seaborn as sns
# Importing the statistics module
import statistics
# Library for hypothesis testing
import scipy
# Libraries for regression modeling
import statsmodels.api as sm
import statsmodels.formula.api as smf
# Library for latex
from IPython.display import Latex, display, Markdown
Loads a copy of the data into our environment. Unlike working with a spreadsheet, it does not affect the original file.
Our data This synthetic dataset contains information on new born babies and their parents. It comes from here.
Read a Comma Separated Values (CSV) data file with pd.read_csv()
.
Need to read in a different file type?
# url is the name and path of the data file
url = "https://raw.githubusercontent.com/DeisData/datasets/main/Birthweight_reduced_kg_R.csv"
# data is the name of the dataFrame we are storing our data in
# pd is pandas and read_csv is a tool in pandas for reading in a csv file
data = pd.read_csv(url)
There are lots of functions and methods we can apply to the dataframe to start inspecting it.
# Method to view head of dataset
data.head()
ID | Length | Birthweight | Headcirc | Gestation | smoker | mage | mnocig | mheight | mppwt | fage | fedyrs | fnocig | fheight | lowbwt | mage35 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1360 | 56 | 4.55 | 34 | 44 | 0 | 20 | 0 | 162 | 57 | 23 | 10 | 35 | 179 | 0 | 0 |
1 | 1016 | 53 | 4.32 | 36 | 40 | 0 | 19 | 0 | 171 | 62 | 19 | 12 | 0 | 183 | 0 | 0 |
2 | 462 | 58 | 4.10 | 39 | 41 | 0 | 35 | 0 | 172 | 58 | 31 | 16 | 25 | 185 | 0 | 1 |
3 | 1187 | 53 | 4.07 | 38 | 44 | 0 | 20 | 0 | 174 | 68 | 26 | 14 | 25 | 189 | 0 | 0 |
4 | 553 | 54 | 3.94 | 37 | 42 | 0 | 24 | 0 | 175 | 66 | 30 | 12 | 0 | 184 | 0 | 0 |
This dataset contains information on new born babies and their parents. It contains mostly continuous variables (although some have only a few values e.g. number of cigarettes smoked per day) and is most useful for correlation and regression.
Main dependent variable = Birthweight (lbs)
Name |
Variable |
Data type |
ID |
Baby number |
|
length |
Length of baby (cm) |
Scale |
Birthweight |
Weight of baby (kg) |
Scale |
headcirumference |
Head Circumference |
Scale |
Gestation |
Gestation (weeks) |
Scale |
smoker |
Mother smokes 1 = smoker 0 =
non-smoker |
Binary |
motherage |
Maternal age |
Scale |
mnocig |
Number of cigarettes smoked per day
by mother |
Scale |
mheight |
Mothers height (cm) |
Scale |
mppwt |
Mothers pre-pregnancy weight (kg) |
Scale |
fage |
Father's age |
Scale |
fedyrs |
Father’s years in education |
Scale |
fnocig |
Number of cigarettes smoked per day
by father |
Scale |
fheight |
Father's height (kg) |
Scale |
lowbwt |
Low birth weight, 0 = No and 1 = yes |
Binary |
mage35 |
Mother over 35, 0 = No and 1 = yes |
Binary |
Your turn to explore! Uncomment and try more Pandas methods we can apply to our dataframe:
# TRY THESE!
# Method to see the tail end of the data
# data.tail()
# Use the DataFrame.info() method to find out more about a dataframe.
# data.info()
# Get Dimensions (rows, columns)
# data.shape
# Get data types
# data.dtypes
# The DataFrame.columns variable stores information about the dataframe’s columns.
# data.columns
# Use DataFrame.describe() to get summary statistics about data.
# data.describe()
When we are doing statistical testing, we form a null hypothesis that states that there is no relationship, either between the two variables being studied (one variable does not affect the other), or between a variable and some constant number like a population mean.
The $p$-value, or probability value, returned with your statistical test is a number describing how likely it is that your data would have occurred by random chance (i.e. that the null hypothesis is true). It is a number between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. A common threshold of significance is $p < 0.05$.
Image from What a p-value tells you about statistical significance
There is a lot of conversation around use and mis-use of $p$-values in scientific analysis. Here are a few articles you may find interesting:
Model: the recipe for $y$ is a slope ($\beta_1$) times $x$ plus an y-intercept ($\beta_0$, aka a straight line).
$y = \beta_0 + \beta_1 x \qquad \mathcal{H}_0: \beta_1 = 0$
... which is a mathy way of writing the good old $y = ax + b$ (here ordered as $y = b + ax$). The task of linear models is simply to find the numbers that best predict y
.
Either way you write it, it's an intercept ($\beta_0$) and a slope ($\beta_1$) yielding a straight line.
With our data
we can use the seaborn regression plot tool sns.regplot
to quickly visualize a linear model fit between mother's height mheight
and baby length Length
when asking the question "Can moether's height predict baby length?"
sns.regplot(x='mheight',y='Length',data=data)
<AxesSubplot:xlabel='mheight', ylabel='Length'>
Practice for you. Use the seaborn regression tool to visualize a simple linear regression to consider the question: "Can gestational age (Gestation
) predict baby weight (Birthweight
)?"
# Your code goes here!:
#sns.regplot()
This is often simply called a regression model which can be extended to multiple regression where there are several $\beta$s and on the right-hand side multiplied with the predictors.
Everything below, from one-sample t-test to two-way ANOVA are just special cases of this system. Nothing more, nothing less.
We use scipy.stats.pearsonr
to calculate a pearson correlation coefficient and the p-value for testing non-correlation between two variables. Correlations of $-1$ or $+1$ imply an exact linear relationships, and 0 implies no correlation.
# pearsonr returns the pearson's correlation coefficient and
# the 2-tailed p-value
stats.pearsonr(data.mheight, data.Length)
(0.4849924026326507, 0.0011331311458135573)
The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets.
Practice for you. Calculate the Pearson's R for the question: "Can gestational age (Gestation
) predict baby weight (Birthweight
)?"
# Your code goes here!:
As the name implies, the Spearman rank correlation is a Pearson correlation on rank-transformed $x$ and $y$:
$\text{rank}(y) = \beta_0 + \beta_1 \cdot \text{rank}(x) \qquad \mathcal{H}_0: \beta_1 = 0$
I'll introduce ranks in a minute. For now, notice that the correlation coefficient of the linear model is identical to a "real" Pearson correlation, but p-values are an approximation which is is appropriate for samples greater than N = 10 and almost perfect when N > 20.
Such a nice and non-mysterious equivalence!
The Spearman rank-order correlation coefficient is a nonparametric measure of the relationship between two variables. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed.
We will use scipy.stats.spearmanr
to calculate the Spearman correlation coefficient with associated p-value.
# spearmanr returns the rank correlation coefficient and
# the 2-tailed p-value
stats.spearmanr(data.mheight, data.Length)
SpearmanrResult(correlation=0.4553274463786581, pvalue=0.002446213023703486)
Practice for you. Calculate the Spearman's R for the question: "Can gestational age (Gestation
) predict baby weight (Birthweight
)?"
# Your code goes here!:
scipy.stats.rankdata
simply takes an array of numbers and "replaces" them with the integers of their rank (1st smallest, 2nd smallest, 3rd smallest, etc.). pd.DataFrame.rank
performs a similar function, but with support for pandas.DataFrames
. So the result of the rank-transformation scipy.stats.rankdata([3.6, 3.4, -5.0, 8.2])
is [3, 2, 1, 4]
. See that in the figure above?
A signed rank is the same, just where we rank according to absolute size first and then add in the sign second. So the signed rank here would be [2, 1, -3, 4]
. Or in code:
def signed_rank(df):
return np.sign(df) * df.abs().rank()
Ranks are all you need to do to convert most parametric tests into their "non-parametric" counterparts!
sns.pairplot()
from seaborn combines joint and marginal views — but rather than focusing on a single relationship, it visualizes every pairwise combination of variables simultaneously:
# Uncomment and run to see a matrix of all relationships, color-coded by smoker / non-smoker
# sns.pairplot(data=data, hue="smoker")
The T-test and Wilcoxon signed-rank test calculate probability for the mean of ONE group of scores. This is a test for the null hypothesis that the expected value (mean) of a sample of independent observations a is equal to the given population mean, popmean.
t-test model: A single number predicts $y$.
$y = \beta_0 \qquad \mathcal{H}_0: \beta_0 = 0$
In other words, it's our good old $y = \beta_0 + \beta_1*x$ where the last term is gone since there is no $x$ (essentially $x=0$).
The same is to a very close approximately true for Wilcoxon signed-rank test, just with the signed ranks of $y$ instead of $y$ itself.
$\text{signed_rank}(y) = \beta_0$
We will use scipy.stats.ttest_1samp
for the parametric comparison test, and scipy.stats.wilcoxon
for the non-parametric comparison test.
Try running the Python code below and see that the linear model (smf.ols
) produces the same $t$, $p$, and $r$ as scipy.stats.ttest_1samp
.
First, we extract the subsets of data for smokers and non-smokers.
# Average of non-smoker mother birthweights
data_nonsmokers = data[data.smoker==0]
data_smokers = data[data.smoker==1]
Next, we will calculate the average birthweight for each group.
avg_birthwgt_nonsmoker = statistics.mean(data_nonsmokers.Birthweight)
avg_birthwgt_smoker = statistics.mean(data_smokers.Birthweight)
print(avg_birthwgt_nonsmoker, avg_birthwgt_smoker)
3.5095 3.1340909090909093
Let's compare the data for smoker's to the average birthweight from a non-smoking mother.
t, p = stats.ttest_1samp(data_smokers.Birthweight, avg_birthwgt_nonsmoker)
print(t,p)
-2.7894379594885215 0.010986630284506064
and the equivalent linear model with an intercept-only is fit with smf.ols
, the tool for ordinary least squares linear regression. If we want to make an equivalent comparison, we will shift the birthweight data by the average birthweight for non-smokers, and then see what the linear model looks like!
# We will shift the birthweight data by the average for non-smoker
# and store the result in a new column, Birthweight_adj
data["Birthweight_adj"] = data["Birthweight"]-avg_birthwgt_nonsmoker
data_smokers.head()
ID | Length | Birthweight | Headcirc | Gestation | smoker | mage | mnocig | mheight | mppwt | fage | fedyrs | fnocig | fheight | lowbwt | mage35 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20 | 792 | 53 | 3.64 | 38 | 40 | 1 | 20 | 2 | 170 | 59 | 24 | 12 | 12 | 185 | 0 | 0 |
21 | 1388 | 51 | 3.14 | 33 | 41 | 1 | 22 | 7 | 160 | 53 | 24 | 16 | 12 | 176 | 0 | 0 |
22 | 575 | 50 | 2.78 | 30 | 37 | 1 | 19 | 7 | 165 | 60 | 20 | 14 | 0 | 183 | 0 | 0 |
23 | 569 | 50 | 2.51 | 35 | 39 | 1 | 22 | 7 | 159 | 52 | 23 | 14 | 25 | 200 | 1 | 0 |
24 | 1363 | 48 | 2.37 | 30 | 37 | 1 | 20 | 7 | 163 | 47 | 20 | 10 | 35 | 185 | 1 | 0 |
# Equivalent linear model: intercept-only
res = smf.ols(formula="Birthweight_adj ~ 1", data=data[data.smoker==1]).fit()
print(res.summary())
OLS Regression Results ============================================================================== Dep. Variable: Birthweight_adj R-squared: -0.000 Model: OLS Adj. R-squared: -0.000 Method: Least Squares F-statistic: nan Date: Thu, 26 Aug 2021 Prob (F-statistic): nan Time: 13:25:46 Log-Likelihood: -20.584 No. Observations: 22 AIC: 43.17 Df Residuals: 21 BIC: 44.26 Df Model: 0 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept -0.3754 0.135 -2.789 0.011 -0.655 -0.096 ============================================================================== Omnibus: 0.182 Durbin-Watson: 1.305 Prob(Omnibus): 0.913 Jarque-Bera (JB): 0.016 Skew: 0.021 Prob(JB): 0.992 Kurtosis: 2.876 Cond. No. 1.00 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Notice that the t
statistics and p
values for the linear model and t-test the same!
We use scipy.stats.wilcoxon
for the non-parametric comparison test and compare it to the $p$-value for a ranked data linear model.
stats.wilcoxon(data[data.smoker==1].Birthweight_adj)
WilcoxonResult(statistic=51.0, pvalue=0.01272726058959961)
signed_rank_data = signed_rank(data[data.smoker==1])
res = smf.ols("Birthweight_adj ~ 1", data=signed_rank_data).fit()
print(res.summary())
OLS Regression Results ============================================================================== Dep. Variable: Birthweight_adj R-squared: -0.000 Model: OLS Adj. R-squared: -0.000 Method: Least Squares F-statistic: nan Date: Thu, 26 Aug 2021 Prob (F-statistic): nan Time: 13:25:46 Log-Likelihood: -84.360 No. Observations: 22 AIC: 170.7 Df Residuals: 21 BIC: 171.8 Df Model: 0 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept -6.8636 2.443 -2.809 0.011 -11.945 -1.782 ============================================================================== Omnibus: 1.920 Durbin-Watson: 1.250 Prob(Omnibus): 0.383 Jarque-Bera (JB): 1.649 Skew: 0.603 Prob(JB): 0.438 Kurtosis: 2.411 Cond. No. 1.00 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
A paired t-test or matched pairs test is used when we are interested in the difference between two variables for the same subject. Often the two variables might be measurements that are separated by time (e.g. cholesterol levels before and after a special diet).
t-test model: a single number (intercept) predicts the pairwise differences.
$y_2-y_1 = \beta_0 \qquad \mathcal{H}_0: \beta_0 = 0$
This means that there is just one $y = y_2 - y_1$ to predict and it becomes a one-sample t-test on the pairwise differences. The idea is therefore also the same as for the one-sample t-test.
Similarly, the Wilcoxon matched pairs only differ from Wilcoxon signed-rank in that it's testing the signed ranks of the pairwise $y_2-y_1$ differences.
$\text{signed_rank}(y_2-y_1) = \beta_0 \qquad \mathcal{H}_0: \beta_0 = 0$
Let's load some data with matched pairs from a study tested whether cholesterol was reduced after using a certain brand of margarine as part of a low fat, low cholesterol diet.
The subjects consumed on average 2.31g of the active ingredient, stanol easter, a day. This data set contains information on 18 people using margarine to reduce cholesterol over three time points.
url = "https://raw.githubusercontent.com/DeisData/datasets/main/Cholesterol_R.csv"
data_matched = pd.read_csv(url)
data_matched.head()
ID | Before | After4weeks | After8weeks | Margarine | |
---|---|---|---|---|---|
0 | 1 | 6.42 | 5.83 | 5.75 | B |
1 | 2 | 6.76 | 6.20 | 6.13 | A |
2 | 3 | 6.56 | 5.83 | 5.71 | B |
3 | 4 | 4.80 | 4.27 | 4.15 | A |
4 | 5 | 8.43 | 7.71 | 7.67 | B |
# Plot the data for each ID, Before and After 8 weeks
fig, ax = plt.subplots()
sns.scatterplot(x='ID',y='Before',data=data_matched, ax=ax)
sns.scatterplot(x='ID',y='After8weeks',data=data_matched, ax=ax)
ax.set_ylabel('Cholesterol Before and After 8 wks')
ax.set_xlabel('Patient ID')
Text(0.5, 0, 'Patient ID')
t, p = scipy.stats.ttest_rel(data_matched.After8weeks, data_matched.Before)
print(t,p)
-14.945967764585072 3.27857534017563e-11
# Let's add a column to our dataFrame subtracting the difference in cholesterol measures
data_matched["cholesterol_diff"] = data_matched.After8weeks - data_matched.Before
data_matched.head()
ID | Before | After4weeks | After8weeks | Margarine | cholesterol_diff | |
---|---|---|---|---|---|---|
0 | 1 | 6.42 | 5.83 | 5.75 | B | -0.67 |
1 | 2 | 6.76 | 6.20 | 6.13 | A | -0.63 |
2 | 3 | 6.56 | 5.83 | 5.71 | B | -0.85 |
3 | 4 | 4.80 | 4.27 | 4.15 | A | -0.65 |
4 | 5 | 8.43 | 7.71 | 7.67 | B | -0.76 |
# The ttest_rel is similar to calculating the difference in y values for each match, and
# fitting a linear model
res = smf.ols(formula="cholesterol_diff ~ 1", data=data_matched).fit()
print(res.summary())
OLS Regression Results ============================================================================== Dep. Variable: cholesterol_diff R-squared: 0.000 Model: OLS Adj. R-squared: 0.000 Method: Least Squares F-statistic: nan Date: Thu, 26 Aug 2021 Prob (F-statistic): nan Time: 13:25:47 Log-Likelihood: 5.9885 No. Observations: 18 AIC: -9.977 Df Residuals: 17 BIC: -9.087 Df Model: 0 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept -0.6289 0.042 -14.946 0.000 -0.718 -0.540 ============================================================================== Omnibus: 0.404 Durbin-Watson: 2.273 Prob(Omnibus): 0.817 Jarque-Bera (JB): 0.371 Skew: 0.291 Prob(JB): 0.831 Kurtosis: 2.605 Cond. No. 1.00 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
/opt/anaconda3/lib/python3.8/site-packages/scipy/stats/stats.py:1603: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=18 warnings.warn("kurtosistest only valid for n>=20 ... continuing "
You will notice the t
statistics and p
values are similar for the Paired T-test and a linear fit of the difference in the matched data.
Again, we do the signed-ranks trick. This is still an approximation, but a close one:
stats.wilcoxon(data_matched.After8weeks, data_matched.Before)
WilcoxonResult(statistic=0.0, pvalue=7.62939453125e-06)
Practice for you. Compare the matched data after 4 weeks and after 8 weeks. What are the results of the paired sample t-test? Of the Wilcoxon matched pairs test?
# Your code goes here!:
ANOVAs are linear models with categorical predictors. The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.
Model: One mean for each group predicts $y$.
$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 +... \qquad \mathcal{H}_0: y = \beta_0$
where $x_i$ are indicators ($x=0$ or $x=1$) where at most one $x_i=1$ while all others are $x_i=0$.
Notice how this is just "more of the same" of what we already did in other models above. When there are only two groups, this model is $y = \beta_0 + \beta_1*x$, i.e. the independent t-test. If there is only one group, it is $y = \beta_0$, i.e. the one-sample t-test.
Since we now regress on more than one $x$, the one-way ANOVA is a multiple regression model.
The Kruskal-Wallis test is simply a one-way ANOVA on the rank-transformed $y$ (value
):
$\text{rank}(y) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 +...$
This approximation is good enough for 12 or more data points.
We make a two-level factor with the Margarine
levels A
and B
, so that the one-way ANOVA basically becomes a "two-sample t-test". We will test the null hypothesis that the two groups have the same population mean of cholesterol level after 8 weeks.
data_matched.head()
ID | Before | After4weeks | After8weeks | Margarine | cholesterol_diff | |
---|---|---|---|---|---|---|
0 | 1 | 6.42 | 5.83 | 5.75 | B | -0.67 |
1 | 2 | 6.76 | 6.20 | 6.13 | A | -0.63 |
2 | 3 | 6.56 | 5.83 | 5.71 | B | -0.85 |
3 | 4 | 4.80 | 4.27 | 4.15 | A | -0.65 |
4 | 5 | 8.43 | 7.71 | 7.67 | B | -0.76 |
The ANOVA test has important assumptions that must be satisfied in order for the associated p-value to be valid. We will see how to test these in 6. Checking Assumptions
The samples are independent.
Each sample is from a normally distributed population. KS Test
The population standard deviations of the groups are all equal. This property is known as homoscedasticity. Bartlett Test
Let's see scipy's dedicated ANOVA function scipy.stats.f_oneway
.
# Extract subsets of data
data_matched_A = data_matched[data_matched.Margarine=='A']
data_matched_B = data_matched[data_matched.Margarine=='B']
scipy.stats.f_oneway(data_matched_A.After8weeks,
data_matched_B.After8weeks)
F_onewayResult(statistic=1.2662631426081903, pvalue=0.27706640602454957)
If ANOVA assumptions are not met, we can consider a non-parametric test.
The Kruskal-Wallis H-test (scipy.stats.kruskal
) tests the null hypothesis that the population median of all of the groups are equal.
# Notice here, we have combined the steps of extracting subsets of data,
# accessing the .After8weeks column from the subset,
# and doing the Kruskal test! Python is powerful!
stats.kruskal(data_matched[data_matched.Margarine=='A'].After8weeks,
data_matched[data_matched.Margarine=='B'].After8weeks)
# Python starts working inside the parentheses, just like algebra.
KruskalResult(statistic=1.0311890838206494, pvalue=0.3098795496397732)
Do Not Reject the hypothesis that median cholesterol levels at 8 weeks are equal for the Margarine A and Margarine B groups.
Think back to the Birthweight data
. Can you set up a test of equal means for the birthweights measured from four groups of mothers who are:
# Try it here!:
Parametric tests like the t-tests and ANOVA have important assumptions that must be satisfied in order for the associated p-value to be valid, including:
Here are just a few tools to check some of these assumptions.
Baby weight is normally distributed (i.e. bell-shaped.) We can check this visually with a histogram sns.histplot
and with the Kolmogorv Smirnov Test scipy.stats.kstest
# [68]
# Are the samples normally distributed?
sns.histplot(data.Birthweight)
stats.kstest(data.Birthweight,'norm')
KstestResult(statistic=0.9725710502961632, pvalue=5.07943581694757e-66)
Your turn Is the number of cigarettes smoked per day normally distributed?
# Try it here!:
The population standard deviations of the groups are all equal. This property is known as homoscedasticity. One of several tests available to check this is the Bartlett Test scipy.stats.bartlett
.
# Are the standard deviations equal for the smokers and non-smokers birthweight data?
stats.bartlett(data_smokers.Birthweight, data_nonsmokers.Birthweight)
BartlettResult(statistic=0.743871469606483, pvalue=0.38842352374096334)
Practice for you: Are the standard deviations of birthweight data equal for the groups of over 35 and under 35 mothers?
# Try it here:
Key Points
Hypothesis testing and p-values give you the significance of an effect / difference.
Parametric and Non-parametric tests are special cases of linear models!
Regressions enable you to express rich links in your data.
Visualizing your data and fitting simple models give insight into the data.