Matplotlib is the "grandfather" data visualisation tool for data in Python - it offers unparalleled control over graphs and diagrams for data, and lets us annotate and customise figures to our heart's content. Matplotlib is built upon for other important modules we'll use later, such as Seaborn, which is more used for statistical visualisation.
All the documentation for Matplotlib can be found here
We can create quick and dirty graphs using the functional method as we've seen before. This method is simpler but won't allow us to customise our plots as much.
First, we need to import our modules:
import matplotlib.pyplot as plt
import numpy as np
If we're running in a notebook, we can run the following line of code to save us from writing plt.show() to give us our graphs. Spyder automatically shows plots, so we don't need this line of code and everything will still work as normal.
%matplotlib inline
We can then graph any data we want using the plt.plot command. Here we will use the np.linspace function to generate a numpy array called x with 101 equally spaced points between 0 and 10, and another numpy array called y that is simply x squared. (Remember numpy arrays can be operated on and the operation will apply element-wise)
x = np.linspace(0,10,101)
y = x**2
We can then plot the graph of x versus y by using the plt.plot() function:
plt.plot(x,y)
[<matplotlib.lines.Line2D at 0x1eb758f7358>]
Great! From here we could add more plots, linestyles, x labels, y labels, titles and more using various methods:
plt.plot(x,x**2,x,x**3)
plt.xlabel("x")
plt.ylabel("y")
plt.title("Graph of x squared and x cubed")
<matplotlib.text.Text at 0x1eb759992b0>
We can also create multiple plots on one output. These are called subplots, and to do this we use the plt.subplot() function - this takes 3 arguments - the number of rows, the number of columns, and then the plot number we're refering to. This may seem like an ugly way to do things, and we'll see a better way to do it later with the object orientated method.
plt.subplot(1,2,1)
plt.plot(x,x**2)
plt.subplot(1,2,2)
plt.plot(x**2,x,'g')
[<matplotlib.lines.Line2D at 0x1eb75a65128>]
The problem with the functional method is that it's a bit of a mess: Commands can be a bit hard to work with and understand at a glance, and problems often happen because of ambiguity in code. Python works on an "object orientetated" philosophy so it makes sense for us to use a similar methodology for graphing. The object orientated way of creating plots works by creating figure objects and calling methods on it.
To start, we need to create a figure:
fig = plt.figure()
<matplotlib.figure.Figure at 0x1eb7594e710>
Figures are a blank canvas for us to work on. To add axis, we use the add.axes() method. We can always look at our figure by just running it as a line of code:
axes = fig.add_axes([0,0,1,1])
fig
Techinically, the "add_axes" method takes in one arguement - a list. This list has 4 elements - the x position with respect to the left of the axis (in percent), the y position with respect to the bottom of the axis (also in percent) - so here, 0,0 just means we don't want white space - and width and height (so here, 1,1 means width 1, height 1).
Now if we want to plot something on these axis, we call the "plot" method with respect to the axes:
axes.plot(x,y)
fig
We can also add x labels, y lables, and more using the following methods:
axes.set_xlabel('X')
axes.set_ylabel('Y')
axes.set_title('x vs x squared')
fig
Now let's see why the object orientated way of creating graphs is so much more powerful:
To start, we can create graphs within graphs:
fig = plt.figure()
axes1 = fig.add_axes([0,0,1,1])
axes2 = fig.add_axes([0.1,0.6,0.4,0.3]) #Creating axes within our axes
axes1.plot(x,y)
axes2.plot(y,x)
axes1.set_xlabel('X')
axes1.set_ylabel('Y')
axes1.set_title('x vs x squared')
axes2.set_xlabel('X')
axes2.set_ylabel('Y')
axes2.set_title('Inverse')
<matplotlib.text.Text at 0x1eb75c43358>
We can also create subplots as before using a shortcut:
fig, axes = plt.subplots()
This combines the "fig" and "axes" call together assuming we want the large plot by using tuple unpacking. We can add arguements to the subplot method to create more plots:
fig, axes = plt.subplots(2,2)
Notice that all the numbers are a little bit squished together - we can fix this using the plt.tight_layout() function:
fig, axes = plt.subplots(2,2)
plt.tight_layout()
Great! For now, to learn how to access these axes let's work with a 2x1 subplot grid.
If we call the "axes" object after using the subplot command, we can see that "axes" is just a list of axes objects:
fig, axes = plt.subplots(1,2)
axes #Notice the square brackets? It's a list (technically a numpy array)!
array([<matplotlib.axes._subplots.AxesSubplot object at 0x000001EB75B213C8>, <matplotlib.axes._subplots.AxesSubplot object at 0x000001EB760ED940>], dtype=object)
This means we can iterate over it, but more importantly, we can select what axes we want by using indexing:
axes[0].plot(x,x)
axes[1].plot(x,x**2)
fig
Say we don't want to use two axes and we'd rather use one set of axes with two lines. We already know how to do this:
fig, ax = plt.subplots()
ax.plot(x,x)
ax.plot(x,x**2)
ax.plot(x,x**3)
[<matplotlib.lines.Line2D at 0x1eb75ec1160>]
How can we tell at a glance which plot is which? A legend would be helpful here - to do this we need to edit out code a little bit:
fig, ax = plt.subplots()
ax.plot(x,x, label='Linear')
ax.plot(x,x**2, label='Squared')
ax.plot(x,x**3, label='Cubed')
ax.legend()
<matplotlib.legend.Legend at 0x1eb75e15940>
Notice how we need to label our plots within their respective methods, and then call the legend method. A similar method is used to change the color and linestyles of these plots:
fig, ax = plt.subplots()
ax.plot(x,x, label='Linear', color = "purple", lw=3,ls=':')
ax.plot(x,x**2, label='Squared', color = "pink", lw =3,ls='-.')
ax.plot(x,x**3, label='Cubed', color = "blue", lw=3,ls='--')
ax.legend()
<matplotlib.legend.Legend at 0x1eb775811d0>
There are hundreds of different plot options out there, as well as the option for 3d plots, contour plots, log scaling and more! To check these out, take a look at the documentation here
Finally, it's worth noting that we can use the object orientated method for scatter graphs, boxplots and histograms.
To start, let's take a scatter plot looking at a (albeit fake) positive correlation.
fig, axes = plt.subplots()
randomData = 0.4 * np.random.randn(101) + 0.5 #Nice way to fake a correlation - 0.4 is the slope, 0.5 is the intercept.
axes.scatter(x,randomData + x)
<matplotlib.collections.PathCollection at 0x1eb7773d4a8>
Next, let's look at how to create a histogram representing a normal distribution:
fig, axes = plt.subplots()
data = np.random.randn(1001)
axes.hist(data)
(array([ 25., 55., 138., 207., 230., 185., 108., 35., 14., 4.]), array([-2.56993424, -1.97147985, -1.37302546, -0.77457106, -0.17611667, 0.42233772, 1.02079212, 1.61924651, 2.2177009 , 2.8161553 , 3.41460969]), <a list of 10 Patch objects>)
And finally, let's look at boxplots:
fig, axes = plt.subplots()
data = [np.random.normal(0,1,100),np.random.normal(0,2,100),np.random.normal(0,3,100)]
axes.boxplot(data)
{'boxes': [<matplotlib.lines.Line2D at 0x1eb7793e5f8>, <matplotlib.lines.Line2D at 0x1eb77961048>, <matplotlib.lines.Line2D at 0x1eb77979358>], 'caps': [<matplotlib.lines.Line2D at 0x1eb77949fd0>, <matplotlib.lines.Line2D at 0x1eb77952828>, <matplotlib.lines.Line2D at 0x1eb77969940>, <matplotlib.lines.Line2D at 0x1eb77969b00>, <matplotlib.lines.Line2D at 0x1eb7797fc18>, <matplotlib.lines.Line2D at 0x1eb77986ac8>], 'fliers': [<matplotlib.lines.Line2D at 0x1eb77958898>, <matplotlib.lines.Line2D at 0x1eb7796fb70>, <matplotlib.lines.Line2D at 0x1eb77990b38>], 'means': [], 'medians': [<matplotlib.lines.Line2D at 0x1eb779529e8>, <matplotlib.lines.Line2D at 0x1eb7796f358>, <matplotlib.lines.Line2D at 0x1eb77986c88>], 'whiskers': [<matplotlib.lines.Line2D at 0x1eb7793ef60>, <matplotlib.lines.Line2D at 0x1eb779497b8>, <matplotlib.lines.Line2D at 0x1eb779618d0>, <matplotlib.lines.Line2D at 0x1eb77961a90>, <matplotlib.lines.Line2D at 0x1eb77979ba8>, <matplotlib.lines.Line2D at 0x1eb7797fa58>]}
Notice how the boxplot and histogram methods also output a lot of other useful information about the plots to access later if we want to. If we don't want these, we can just run fig again in another cell, or omit %matplotlib inline:
fig
We're going to create a graph that looks at what happens to the graph of ex, 2x, 3x, x2 and x3. Soon we will be able to import data so we're not working the mathematical functions all the time!
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
x = np.linspace(0,10,101)
fig = plt.figure()
axes = fig.add_axes([0,0,1,1])
axes.plot(x, np.exp(x), label = "Exponential")
axes.plot(x, 2**x, label = "2 Power")
axes.plot(x, 3**x, label = "3 Power")
axes.plot(x, x**2, label = "Squared")
axes.plot(x, x**3, label = "Cubed")
axes.set_ylim((0,1000))
axes.legend()
<matplotlib.legend.Legend at 0x1eb77a383c8>
Try graphing the sin(), cos() and tan() functions on one graph, adding a legend, title and axis. Then, try using the subplot command to put them all on different plots.