#!/usr/bin/env python # coding: utf-8 # # # *This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).* # # *The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!* # # *No changes were made to the contents of this notebook from the original.* # # < [Customizing Plot Legends](04.06-Customizing-Legends.ipynb) | [Contents](Index.ipynb) | [Multiple Subplots](04.08-Multiple-Subplots.ipynb) > # # Customizing Colorbars # Plot legends identify discrete labels of discrete points. # For continuous labels based on the color of points, lines, or regions, a labeled colorbar can be a great tool. # In Matplotlib, a colorbar is a separate axes that can provide a key for the meaning of colors in a plot. # Because the book is printed in black-and-white, this section has an accompanying online supplement where you can view the figures in full color (https://github.com/jakevdp/PythonDataScienceHandbook). # We'll start by setting up the notebook for plotting and importing the functions we will use: # In[1]: import matplotlib.pyplot as plt plt.style.use('classic') # In[2]: get_ipython().run_line_magic('matplotlib', 'inline') import numpy as np # As we have seen several times throughout this section, the simplest colorbar can be created with the ``plt.colorbar`` function: # In[3]: x = np.linspace(0, 10, 1000) I = np.sin(x) * np.cos(x[:, np.newaxis]) plt.imshow(I) plt.colorbar(); # We'll now discuss a few ideas for customizing these colorbars and using them effectively in various situations. # ## Customizing Colorbars # # The colormap can be specified using the ``cmap`` argument to the plotting function that is creating the visualization: # In[4]: plt.imshow(I, cmap='gray'); # All the available colormaps are in the ``plt.cm`` namespace; using IPython's tab-completion will give you a full list of built-in possibilities: # ``` # plt.cm. # ``` # But being *able* to choose a colormap is just the first step: more important is how to *decide* among the possibilities! # The choice turns out to be much more subtle than you might initially expect. # ### Choosing the Colormap # # A full treatment of color choice within visualization is beyond the scope of this book, but for entertaining reading on this subject and others, see the article ["Ten Simple Rules for Better Figures"](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003833). # Matplotlib's online documentation also has an [interesting discussion](http://Matplotlib.org/1.4.1/users/colormaps.html) of colormap choice. # # Broadly, you should be aware of three different categories of colormaps: # # - *Sequential colormaps*: These are made up of one continuous sequence of colors (e.g., ``binary`` or ``viridis``). # - *Divergent colormaps*: These usually contain two distinct colors, which show positive and negative deviations from a mean (e.g., ``RdBu`` or ``PuOr``). # - *Qualitative colormaps*: these mix colors with no particular sequence (e.g., ``rainbow`` or ``jet``). # # The ``jet`` colormap, which was the default in Matplotlib prior to version 2.0, is an example of a qualitative colormap. # Its status as the default was quite unfortunate, because qualitative maps are often a poor choice for representing quantitative data. # Among the problems is the fact that qualitative maps usually do not display any uniform progression in brightness as the scale increases. # # We can see this by converting the ``jet`` colorbar into black and white: # In[5]: from matplotlib.colors import LinearSegmentedColormap def grayscale_cmap(cmap): """Return a grayscale version of the given colormap""" cmap = plt.cm.get_cmap(cmap) colors = cmap(np.arange(cmap.N)) # convert RGBA to perceived grayscale luminance # cf. http://alienryderflex.com/hsp.html RGB_weight = [0.299, 0.587, 0.114] luminance = np.sqrt(np.dot(colors[:, :3] ** 2, RGB_weight)) colors[:, :3] = luminance[:, np.newaxis] return LinearSegmentedColormap.from_list(cmap.name + "_gray", colors, cmap.N) def view_colormap(cmap): """Plot a colormap with its grayscale equivalent""" cmap = plt.cm.get_cmap(cmap) colors = cmap(np.arange(cmap.N)) cmap = grayscale_cmap(cmap) grayscale = cmap(np.arange(cmap.N)) fig, ax = plt.subplots(2, figsize=(6, 2), subplot_kw=dict(xticks=[], yticks=[])) ax[0].imshow([colors], extent=[0, 10, 0, 1]) ax[1].imshow([grayscale], extent=[0, 10, 0, 1]) # In[6]: view_colormap('jet') # Notice the bright stripes in the grayscale image. # Even in full color, this uneven brightness means that the eye will be drawn to certain portions of the color range, which will potentially emphasize unimportant parts of the dataset. # It's better to use a colormap such as ``viridis`` (the default as of Matplotlib 2.0), which is specifically constructed to have an even brightness variation across the range. # Thus it not only plays well with our color perception, but also will translate well to grayscale printing: # In[7]: view_colormap('viridis') # If you favor rainbow schemes, another good option for continuous data is the ``cubehelix`` colormap: # In[8]: view_colormap('cubehelix') # For other situations, such as showing positive and negative deviations from some mean, dual-color colorbars such as ``RdBu`` (*Red-Blue*) can be useful. However, as you can see in the following figure, it's important to note that the positive-negative information will be lost upon translation to grayscale! # In[9]: view_colormap('RdBu') # We'll see examples of using some of these color maps as we continue. # # There are a large number of colormaps available in Matplotlib; to see a list of them, you can use IPython to explore the ``plt.cm`` submodule. For a more principled approach to colors in Python, you can refer to the tools and documentation within the Seaborn library (see [Visualization With Seaborn](04.14-Visualization-With-Seaborn.ipynb)). # ### Color limits and extensions # # Matplotlib allows for a large range of colorbar customization. # The colorbar itself is simply an instance of ``plt.Axes``, so all of the axes and tick formatting tricks we've learned are applicable. # The colorbar has some interesting flexibility: for example, we can narrow the color limits and indicate the out-of-bounds values with a triangular arrow at the top and bottom by setting the ``extend`` property. # This might come in handy, for example, if displaying an image that is subject to noise: # In[10]: # make noise in 1% of the image pixels speckles = (np.random.random(I.shape) < 0.01) I[speckles] = np.random.normal(0, 3, np.count_nonzero(speckles)) plt.figure(figsize=(10, 3.5)) plt.subplot(1, 2, 1) plt.imshow(I, cmap='RdBu') plt.colorbar() plt.subplot(1, 2, 2) plt.imshow(I, cmap='RdBu') plt.colorbar(extend='both') plt.clim(-1, 1); # Notice that in the left panel, the default color limits respond to the noisy pixels, and the range of the noise completely washes-out the pattern we are interested in. # In the right panel, we manually set the color limits, and add extensions to indicate values which are above or below those limits. # The result is a much more useful visualization of our data. # ### Discrete Color Bars # # Colormaps are by default continuous, but sometimes you'd like to represent discrete values. # The easiest way to do this is to use the ``plt.cm.get_cmap()`` function, and pass the name of a suitable colormap along with the number of desired bins: # In[11]: plt.imshow(I, cmap=plt.cm.get_cmap('Blues', 6)) plt.colorbar() plt.clim(-1, 1); # The discrete version of a colormap can be used just like any other colormap. # ## Example: Handwritten Digits # # For an example of where this might be useful, let's look at an interesting visualization of some hand written digits data. # This data is included in Scikit-Learn, and consists of nearly 2,000 $8 \times 8$ thumbnails showing various hand-written digits. # # For now, let's start by downloading the digits data and visualizing several of the example images with ``plt.imshow()``: # In[12]: # load images of the digits 0 through 5 and visualize several of them from sklearn.datasets import load_digits digits = load_digits(n_class=6) fig, ax = plt.subplots(8, 8, figsize=(6, 6)) for i, axi in enumerate(ax.flat): axi.imshow(digits.images[i], cmap='binary') axi.set(xticks=[], yticks=[]) # Because each digit is defined by the hue of its 64 pixels, we can consider each digit to be a point lying in 64-dimensional space: each dimension represents the brightness of one pixel. # But visualizing relationships in such high-dimensional spaces can be extremely difficult. # One way to approach this is to use a *dimensionality reduction* technique such as manifold learning to reduce the dimensionality of the data while maintaining the relationships of interest. # Dimensionality reduction is an example of unsupervised machine learning, and we will discuss it in more detail in [What Is Machine Learning?](05.01-What-Is-Machine-Learning.ipynb). # # Deferring the discussion of these details, let's take a look at a two-dimensional manifold learning projection of this digits data (see [In-Depth: Manifold Learning](05.10-Manifold-Learning.ipynb) for details): # In[13]: # project the digits into 2 dimensions using IsoMap from sklearn.manifold import Isomap iso = Isomap(n_components=2) projection = iso.fit_transform(digits.data) # We'll use our discrete colormap to view the results, setting the ``ticks`` and ``clim`` to improve the aesthetics of the resulting colorbar: # In[14]: # plot the results plt.scatter(projection[:, 0], projection[:, 1], lw=0.1, c=digits.target, cmap=plt.cm.get_cmap('cubehelix', 6)) plt.colorbar(ticks=range(6), label='digit value') plt.clim(-0.5, 5.5) # The projection also gives us some interesting insights on the relationships within the dataset: for example, the ranges of 5 and 3 nearly overlap in this projection, indicating that some hand written fives and threes are difficult to distinguish, and therefore more likely to be confused by an automated classification algorithm. # Other values, like 0 and 1, are more distantly separated, and therefore much less likely to be confused. # This observation agrees with our intuition, because 5 and 3 look much more similar than do 0 and 1. # # We'll return to manifold learning and to digit classification in [Chapter 5](05.00-Machine-Learning.ipynb). # # < [Customizing Plot Legends](04.06-Customizing-Legends.ipynb) | [Contents](Index.ipynb) | [Multiple Subplots](04.08-Multiple-Subplots.ipynb) >