Show different ways to present statistical data.
This script is written in MATLAB or IPpython style, to show how
best to use Python interactively.
Note than in IPython, the show()
commands are automatically generated.
The examples contain:
Author: thomas haslwanter, March-2015
First, import the libraries that you are going to need. You could also do that later, but it is better style to do that at the beginning. pylab imports the numpy, scipy, and matplotlib.pyplot libraries into the current environment
%pylab inline
import scipy.stats as stats
import seaborn as sns
sns.set_context('notebook')
sns.set_style('darkgrid')
Populating the interactive namespace from numpy and matplotlib
# Generate data that are normally distributed
x = randn(50)
plot(x,'.')
title('Scatter Plot')
xlabel('X')
ylabel('Y')
draw()
hist(x)
xlabel('Data Values')
ylabel('Frequency')
title('Histogram, default settings')
<matplotlib.text.Text at 0x24320fd3550>
x = randn(1000)
hist(x,25)
xlabel('Data Values')
ylabel('Frequency')
title('Histogram, 25 bins')
<matplotlib.text.Text at 0x2432172c978>
Kernel Density Estimation (KDE)
import seaborn as sns
sns.kdeplot(x)
xlabel('Data Values')
ylabel('Density')
<matplotlib.text.Text at 0x243215c6e10>
numbins = 20
cdf = stats.cumfreq(x,numbins)
plot(cdf[0])
xlabel('Data Values')
ylabel('Cumulative Frequency')
title('Cumulative probablity density function')
<matplotlib.text.Text at 0x24320f4f9b0>
# The error bars indicate 1.5* the inter-quartile-range (IQR), and the box consists of the
# first, second (middle) and third quartile
boxplot(x, sym='o')
title('Boxplot')
ylabel('Values')
<matplotlib.text.Text at 0x24320fe1550>
boxplot(x, vert=False, sym='*')
title('Boxplot, horizontal')
xlabel('Values')
<matplotlib.text.Text at 0x24321737748>
x = arange(5)
y = x**2
errorBar = x/2
errorbar(x,y, yerr=errorBar, fmt='o', capsize=5, capthick=3)
plt.xlabel('Data Values')
plt.ylabel('Measurements')
plt.title('Errorbars')
xlim([-0.2, 4.2])
ylim([-0.2, 19])
(-0.2, 19)
# Visual check
x = randn(100)
_ = stats.probplot(x, plot=plt)
title('Probplot - check for normality')
<matplotlib.text.Text at 0x24320df7438>
# Generate data
x = randn(200)
y = 10+0.5*x+randn(len(x))
# Scatter plot
scatter(x,y)
# This one is quite similar to "plot(x,y,'.')"
title('Scatter plot of data')
xlabel('X')
ylabel('Y')
<matplotlib.text.Text at 0x24320e67dd8>
M = vstack((ones(len(x)), x)).T
pars = linalg.lstsq(M,y)[0]
intercept = pars[0]
slope = pars[1]
scatter(x,y)
plot(x, intercept + slope*x, 'r')
show()