How to make beautiful
data visualizations

in Python with matplotlib and Plotly

This notebook comes in response to this twitter conversation about converting a certain matplotlib figure featured in Randal S. Olson blog post (link here and embedded below).


July 07 2014 EDIT:

Plotly's matplotlib converter has been upgraded in the past week and is now distributed in the new latest version of Plotly's Python package plotly 1.1.2. These upgrades make the conversion of this particular matplotlib plot (shown below) significantly easier.


In [1]:
from IPython.display import IFrame
IFrame("http://www.randalolson.com/2014/06/28/"
       "how-to-make-beautiful-data-visualizations-in-python-with-matplotlib/", 
       720, 400)
Out[1]:

For this particular figure,

In [2]:
IFrame("http://www.randalolson.com/wp-content/uploads/"
       "percent-bachelors-degrees-women-usa.png",
       1000, 1000)
Out[2]:


In brief, we will show you how programmatically remake what Plotly user Dreamshot made here.




First, check the version which version of the Python API installed on your machine:

In [3]:
import plotly
plotly.__version__
Out[3]:
'1.1.2'

If not the latest (version 1.1.2), upgrade using pip:

$ pip install plotly --upgrade

Next, if you have a plotly account as well as a credentials file set up on your machine, singing in to Plotly's servers is done automatically while importing plotly.plotly:

In [4]:
import plotly.plotly as py  

For more info on how to sign up or sign in to Plotly, see Plotly's Python API User Guide

If more convenient, you can manually sign in to Plotly by typing:

>>> py.sign_in('your_username','your_api_key')

We also make use of Plotly's tools module in this notebook; import it here:

In [5]:
import plotly.tools as tls

Original matplotlib figure

First, remake the original matplotlib figure. The following code cell was copied verbatim from the Randal S. Olsen's blog post with the exception of the last line where we grab the underlying figure object and link it to a variable named dataviz1.

In [6]:
%pylab inline  
from pandas import read_csv  
  
# Read the data into a pandas DataFrame.  
gender_degree_data = read_csv("http://www.randalolson.com/wp-content/uploads/"
                              "percent-bachelors-degrees-women-usa.csv")  
  
# These are the "Tableau 20" colors as RGB.  
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),  
             (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),  
             (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),  
             (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),  
             (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]  
  
# Scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.  
for i in range(len(tableau20)):  
    r, g, b = tableau20[i]  
    tableau20[i] = (r / 255., g / 255., b / 255.)  
  
# You typically want your plot to be ~1.33x wider than tall. This plot is a rare  
# exception because of the number of lines being plotted on it.  
# Common sizes: (10, 7.5) and (12, 9)  
figure(figsize=(12, 14))  
  
# Remove the plot frame lines. They are unnecessary chartjunk.  
ax = subplot(111)  
ax.spines["top"].set_visible(False)  
ax.spines["bottom"].set_visible(False)  
ax.spines["right"].set_visible(False)  
ax.spines["left"].set_visible(False)  
  
# Ensure that the axis ticks only show up on the bottom and left of the plot.  
# Ticks on the right and top of the plot are generally unnecessary chartjunk.  
ax.get_xaxis().tick_bottom()  
ax.get_yaxis().tick_left()  
  
# Limit the range of the plot to only where the data is.  
# Avoid unnecessary whitespace.  
ylim(0, 90)  
xlim(1968, 2014)  
  
# Make sure your axis ticks are large enough to be easily read.  
# You don't want your viewers squinting to read your plot.  
yticks(range(0, 91, 10), [str(x) + "%" for x in range(0, 91, 10)], fontsize=14)  
xticks(fontsize=14)  
  
# Provide tick lines across the plot to help your viewers trace along  
# the axis ticks. Make sure that the lines are light and small so they  
# don't obscure the primary data lines.  
for y in range(10, 91, 10):  
    plot(range(1968, 2012), [y] * len(range(1968, 2012)), "--", 
         lw=0.5, color="black", alpha=0.3)  
  
# Remove the tick marks; they are unnecessary with the tick lines we just plotted.  
plt.tick_params(axis="both", which="both", bottom="off", top="off",  
                labelbottom="on", left="off", right="off", labelleft="on")  
  
# Now that the plot is prepared, it's time to actually plot the data!  
# Note that I plotted the majors in order of the highest % in the final year.  
majors = ['Health Professions', 'Public Administration', 'Education', 'Psychology',  
          'Foreign Languages', 'English', 'Communications\nand Journalism',  
          'Art and Performance', 'Biology', 'Agriculture',  
          'Social Sciences and History', 'Business', 'Math and Statistics',  
          'Architecture', 'Physical Sciences', 'Computer Science',  
          'Engineering']  
  
for rank, column in enumerate(majors):  
    # Plot each line separately with its own color, using the Tableau 20  
    # color set in order.  
    plot(gender_degree_data.Year.values,  
            gender_degree_data[column.replace("\n", " ")].values,  
            lw=2.5, color=tableau20[rank])  
      
    # Add a text label to the right end of every line. Most of the code below  
    # is adding specific offsets y position because some labels overlapped.  
    y_pos = gender_degree_data[column.replace("\n", " ")].values[-1] - 0.5  
    if column == "Foreign Languages":  
        y_pos += 0.5  
    elif column == "English":  
        y_pos -= 0.5  
    elif column == "Communications\nand Journalism":  
        y_pos += 0.75  
    elif column == "Art and Performance":  
        y_pos -= 0.25  
    elif column == "Agriculture":  
        y_pos += 1.25  
    elif column == "Social Sciences and History":  
        y_pos += 0.25  
    elif column == "Business":  
        y_pos -= 0.75  
    elif column == "Math and Statistics":  
        y_pos += 0.75  
    elif column == "Architecture":  
        y_pos -= 0.75  
    elif column == "Computer Science":  
        y_pos += 0.75  
    elif column == "Engineering":  
        y_pos -= 0.25  
      
    # Again, make sure that all labels are large enough to be easily read  
    # by the viewer.  
    text(2011.5, y_pos, column, fontsize=14, color=tableau20[rank])  
      
# matplotlib's title() call centers the title on the plot, but not the graph,  
# so I used the text() call to customize where the title goes.  
  
# Make the title big enough so it spans the entire plot, but don't make it  
# so big that it requires two lines to show.  
  
# Note that if the title is descriptive enough, it is unnecessary to include  
# axis labels; they are self-evident, in this plot's case.  
text(1995, 93, "Percentage of Bachelor's degrees conferred to women in the U.S.A."  
       ", by major (1970-2012)", fontsize=17, ha="center")  
  
# Always include your data source(s) and copyright notice! And for your  
# data sources, tell your viewers exactly where the data came from,  
# preferably with a direct link to the data. Just telling your viewers  
# that you used data from the "U.S. Census Bureau" is completely useless:  
# the U.S. Census Bureau provides all kinds of data, so how are your  
# viewers supposed to know which data set you used?  
text(1966, -8, "Data source: nces.ed.gov/programs/digest/2013menu_tables.asp"  
       "\nAuthor: Randy Olson (randalolson.com / @randal_olson)"  
       "\nNote: Some majors are missing because the historical data "  
       "is not available for them", fontsize=10)  
  
# Finally, save the figure as a PNG.  
# You can also save it as a PDF, JPEG, etc.  
# Just change the file extension in this call.  
# bbox_inches="tight" removes all the extra whitespace on the edges of your plot.  
#savefig("percent-bachelors-degrees-women-usa.png", bbox_inches="tight");  

# (!) Grab figure object and link it to variable (must be in same cell as figure)
dataviz1 = gcf()
Populating the interactive namespace from numpy and matplotlib


Plotly allows you to convert a matplotlib figure object (dataviz1 in our case here) into a Plotly figure with one line of code:

In [7]:
py.iplot_mpl(dataviz1, resize=False, filename='dataviz1', width=960, height=1120)
/usr/local/lib/python2.7/dist-packages/plotly/matplotlylib/renderer.py:506: UserWarning:

Looks like the annotation(s) you are trying 
to draw lies/lay outside the given figure size.

Therefore, the resulting Plotly figure may not be 
large enough to view the full text. To adjust 
the size of the figure, use the 'width' and 
'height' keys in the Layout object. Alternatively,
use the Margin object to adjust the figure's margins.

/usr/local/lib/python2.7/dist-packages/plotly/tools.py:534: UserWarning:

Looks like you used a newline character: '\n'.

Plotly uses a subset of HTML escape characters
to do things like newline (<br>), bold (<b></b>),
italics (<i></i>), etc. Your newline characters 
have been converted to '<br>' so they will show 
up right on your Plotly figure!

where the resize=False keyword argument tells Plotly to not set the figure's size to the default Plotly dimensions. Keyword argument width and height set the dimensions of the display box shown in this notebook.

To view the graph in a different tab, click on the data and graph button on the bottom right corner of the plot which leads you to the figure's unique URL.

While plotting, py.iplot_mpl() spew out two warnings:

  • Looks like the annotation(s) you are trying to draw lies/lay outside the given figure size.

This means that the original matplotlib figure contain annotation(s) lying outside the figure's margins. Upon printing (either using savefig() or in the matplotlib inline) matplotlib adjusts the margins to fit all the annotations. In contrast, running show() would yield truncation annotation(s).

So, we will have to adjust the margin slightly.

  • Looks like you used a newline character: '\n'.

Plotly uses a subset of HTML syntax to do insert new line in strings. In version 1.1.2 of the Python API, all \n escape sequences are converted went sent to Plotly to <br> so that multi-line string render as desired.

Moreover, looking more closely at the Plotly figure,

  • Converting ticks (or the lack of ticks) is still an issue (that we are currently trying to fix), so we will have to remove them in this Python session.

  • The title is not exactly at the same position as on the original matplotlib figure. Therefore, we will add a Plotly title which addiontinally will make the figure's URL more descriptive.

  • And finally, we will make full use of Plotly's interactibility by adding hover text to the data traces.

Plotly version

So, first let's convert the matplotlib figure object to a Plotly figure object:

In [8]:
dataviz1_plotly = tls.mpl_to_plotly(dataviz1)

print dataviz1_plotly.to_string()   # show plotly figure object in notebook
Figure(
    data=Data([
        Scatter(
            x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...],
            y=[10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, ...],
            mode='lines',
            name='_line0',
            line=Line(
                color='#000000',
                width=0.5,
                dash='dash',
                opacity=0.3
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...],
            y=[20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, ...],
            mode='lines',
            name='_line1',
            line=Line(
                color='#000000',
                width=0.5,
                dash='dash',
                opacity=0.3
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...],
            y=[30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, ...],
            mode='lines',
            name='_line2',
            line=Line(
                color='#000000',
                width=0.5,
                dash='dash',
                opacity=0.3
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...],
            y=[40.0, 40.0, 40.0, 40.0, 40.0, 40.0, 40.0, 40.0, 40.0, 40.0, ...],
            mode='lines',
            name='_line3',
            line=Line(
                color='#000000',
                width=0.5,
                dash='dash',
                opacity=0.3
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...],
            y=[50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, ...],
            mode='lines',
            name='_line4',
            line=Line(
                color='#000000',
                width=0.5,
                dash='dash',
                opacity=0.3
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...],
            y=[60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, ...],
            mode='lines',
            name='_line5',
            line=Line(
                color='#000000',
                width=0.5,
                dash='dash',
                opacity=0.3
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...],
            y=[70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, ...],
            mode='lines',
            name='_line6',
            line=Line(
                color='#000000',
                width=0.5,
                dash='dash',
                opacity=0.3
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...],
            y=[80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, ...],
            mode='lines',
            name='_line7',
            line=Line(
                color='#000000',
                width=0.5,
                dash='dash',
                opacity=0.3
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...],
            y=[90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, ...],
            mode='lines',
            name='_line8',
            line=Line(
                color='#000000',
                width=0.5,
                dash='dash',
                opacity=0.3
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[77.099999999999994, 75.5, 76.900000000000006, 77.40000000000...],
            mode='lines',
            name='_line9',
            line=Line(
                color='#1F77B4',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[68.400000000000006, 65.5, 62.600000000000001, 64.29999999999...],
            mode='lines',
            name='_line10',
            line=Line(
                color='#AEC7E8',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[74.535327580000001, 74.149203689999993, 73.554519959999993, ...],
            mode='lines',
            name='_line11',
            line=Line(
                color='#FF7F0E',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[44.399999999999999, 46.200000000000003, 47.600000000000001, ...],
            mode='lines',
            name='_line12',
            line=Line(
                color='#FFBB78',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[73.799999999999997, 73.900000000000006, 74.599999999999994, ...],
            mode='lines',
            name='_line13',
            line=Line(
                color='#2CA02C',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[65.570923429999993, 64.556485159999994, 63.664263200000001, ...],
            mode='lines',
            name='_line14',
            line=Line(
                color='#98DF8A',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[35.299999999999997, 35.5, 36.600000000000001, 38.39999999999...],
            mode='lines',
            name='_line15',
            line=Line(
                color='#D62728',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[59.700000000000003, 59.899999999999999, 60.399999999999999, ...],
            mode='lines',
            name='_line16',
            line=Line(
                color='#FF9896',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[29.088362969999999, 29.394402849999999, 29.810221049999999, ...],
            mode='lines',
            name='_line17',
            line=Line(
                color='#9467BD',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[4.2297979799999998, 5.452796685, 7.4207102200000001, 9.65360...],
            mode='lines',
            name='_line18',
            line=Line(
                color='#C5B0D5',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[36.799999999999997, 36.200000000000003, 36.100000000000001, ...],
            mode='lines',
            name='_line19',
            line=Line(
                color='#8C564B',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[9.0644389749999998, 9.5031865939999989, 10.5589621, 12.80460...],
            mode='lines',
            name='_line20',
            line=Line(
                color='#C49C94',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[38.0, 39.0, 40.200000000000003, 40.899999999999999, 41.79999...],
            mode='lines',
            name='_line21',
            line=Line(
                color='#E377C2',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[11.921005389999999, 12.003105590000001, 13.21459351, 14.7916...],
            mode='lines',
            name='_line22',
            line=Line(
                color='#F7B6D2',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[13.800000000000001, 14.9, 14.800000000000001, 16.5, 18.19999...],
            mode='lines',
            name='_line23',
            line=Line(
                color='#7F7F7F',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[13.6, 13.6, 14.9, 16.399999999999999, 18.899999999999999, 19...],
            mode='lines',
            name='_line24',
            line=Line(
                color='#C7C7C7',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        ),
        Scatter(
            x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...],
            y=[0.80000000000000004, 1.0, 1.2, 1.6000000000000001, 2.2000000...],
            mode='lines',
            name='_line25',
            line=Line(
                color='#BCBD22',
                width=2.5,
                dash='solid',
                opacity=1
            ),
            xaxis='x1',
            yaxis='y1'
        )
    ]),
    layout=Layout(
        showlegend=False,
        autosize=False,
        width=960,
        height=1120,
        annotations=Annotations([
            Annotation(
                x=2011.5,
                y=84.3,
                xref='x1',
                yref='y1',
                text='Health Professions',
                font=Font(
                    size=14.0,
                    color='#1F77B4'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=81.4,
                xref='x1',
                yref='y1',
                text='Public Administration',
                font=Font(
                    size=14.0,
                    color='#AEC7E8'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=78.93281184,
                xref='x1',
                yref='y1',
                text='Education',
                font=Font(
                    size=14.0,
                    color='#FF7F0E'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=76.2,
                xref='x1',
                yref='y1',
                text='Psychology',
                font=Font(
                    size=14.0,
                    color='#FFBB78'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=69.5,
                xref='x1',
                yref='y1',
                text='Foreign Languages',
                font=Font(
                    size=14.0,
                    color='#2CA02C'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=67.42673015,
                xref='x1',
                yref='y1',
                text='English',
                font=Font(
                    size=14.0,
                    color='#98DF8A'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=62.45,
                xref='x1',
                yref='y1',
                text='Communications\nand Journalism',
                font=Font(
                    size=14.0,
                    color='#D62728'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=60.45,
                xref='x1',
                yref='y1',
                text='Art and Performance',
                font=Font(
                    size=14.0,
                    color='#FF9896'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=58.2423969,
                xref='x1',
                yref='y1',
                text='Biology',
                font=Font(
                    size=14.0,
                    color='#9467BD'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=50.78718193,
                xref='x1',
                yref='y1',
                text='Agriculture',
                font=Font(
                    size=14.0,
                    color='#C5B0D5'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=48.95,
                xref='x1',
                yref='y1',
                text='Social Sciences and History',
                font=Font(
                    size=14.0,
                    color='#8C564B'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=46.93041792,
                xref='x1',
                yref='y1',
                text='Business',
                font=Font(
                    size=14.0,
                    color='#C49C94'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=43.35,
                xref='x1',
                yref='y1',
                text='Math and Statistics',
                font=Font(
                    size=14.0,
                    color='#E377C2'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=41.5234375,
                xref='x1',
                yref='y1',
                text='Architecture',
                font=Font(
                    size=14.0,
                    color='#F7B6D2'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=39.6,
                xref='x1',
                yref='y1',
                text='Physical Sciences',
                font=Font(
                    size=14.0,
                    color='#7F7F7F'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=18.45,
                xref='x1',
                yref='y1',
                text='Computer Science',
                font=Font(
                    size=14.0,
                    color='#C7C7C7'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=2011.5,
                y=16.75,
                xref='x1',
                yref='y1',
                text='Engineering',
                font=Font(
                    size=14.0,
                    color='#BCBD22'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            ),
            Annotation(
                x=0.58616866063612849,
                y=1.032144227080936,
                xref='paper',
                yref='paper',
                text="Percentage of Bachelor's degrees conferred to women i...",
                font=Font(
                    size=17.0,
                    color='#000000'
                ),
                align='center',
                showarrow=False,
                opacity=1,
                xanchor='center',
                yanchor='bottom'
            ),
            Annotation(
                x=-0.043419900787855584,
                y=-0.088786600179005234,
                xref='paper',
                yref='paper',
                text='Data source: nces.ed.gov/programs/digest/2013menu_tab...',
                font=Font(
                    size=10.0,
                    color='#000000'
                ),
                align='left',
                showarrow=False,
                opacity=1,
                xanchor='left',
                yanchor='bottom'
            )
        ]),
        margin=Margin(
            l=120,
            r=95,
            b=140,
            t=111,
            pad=0
        ),
        hovermode='closest',
        xaxis1=XAxis(
            range=[1968.0, 2014.0],
            domain=[0.0, 1.0],
            type='linear',
            showgrid=False,
            zeroline=False,
            showline=False,
            nticks=7,
            ticks='inside',
            tickfont=Font(
                size=14.0
            ),
            anchor='y1',
            side='bottom',
            mirror=False
        ),
        yaxis1=YAxis(
            range=[0.0, 90.0],
            domain=[0.0, 1.0],
            type='linear',
            showgrid=False,
            zeroline=False,
            showline=False,
            autotick=False,
            ticks='inside',
            tick0=0,
            dtick=10,
            tickfont=Font(
                size=14.0
            ),
            anchor='x1',
            side='left',
            mirror=False
        )
    )
)
In [9]:
# List of all annotation texts, show it in notebook
annos_text = [anno['text'] for anno in dataviz1_plotly['layout']['annotations']]
annos_text
Out[9]:
['Health Professions',
 'Public Administration',
 'Education',
 'Psychology',
 'Foreign Languages',
 'English',
 'Communications\nand Journalism',
 'Art and Performance',
 'Biology',
 'Agriculture',
 'Social Sciences and History',
 'Business',
 'Math and Statistics',
 'Architecture',
 'Physical Sciences',
 'Computer Science',
 'Engineering',
 "Percentage of Bachelor's degrees conferred to women in the U.S.A., by major (1970-2012)",
 'Data source: nces.ed.gov/programs/digest/2013menu_tables.asp\nAuthor: Randy Olson (randalolson.com / @randal_olson)\nNote: Some majors are missing because the historical data is not available for them']
In [10]:
# List all majors in dataset, show it in notebook
majors = annos_text[:-2]
majors
Out[10]:
['Health Professions',
 'Public Administration',
 'Education',
 'Psychology',
 'Foreign Languages',
 'English',
 'Communications\nand Journalism',
 'Art and Performance',
 'Biology',
 'Agriculture',
 'Social Sciences and History',
 'Business',
 'Math and Statistics',
 'Architecture',
 'Physical Sciences',
 'Computer Science',
 'Engineering']

And now make a few updates on the Plotly figure object:

In [11]:
# (1) Adjust margins (use our web GUI to easier find the appropriate values)
dataviz1_plotly['layout']['margin'].update(
    l=50,   # left margin in pixels
    r=160,  # right " " "
    b=100,  # bottom " " "
    t=100   # top " " "
)

# (2) Add title (appears in figure's URL, nice for sharing), remove title annotation
dataviz1_plotly['layout'].update(
    title=annos_text[-2],
    titlefont=dict(size=20)  # increase font size
)
dataviz1_plotly['layout']['annotations'][-2].update(text=' ')

# (3) Remove tick lines
dataviz1_plotly['layout']['xaxis1'].update(ticks='')
dataviz1_plotly['layout']['yaxis1'].update(ticks='')

# (4) Add hover label to data trace, remove hover label from grid traces
N_traces = len(dataviz1_plotly['data'])
N_majors = len(majors)
update_name = [{'name': ' '} for i in range(N_traces)]
update_name[N_traces-N_majors:] = [{'name': major} for major in majors]
dataviz1_plotly['data'].update(update_name)

# (5) Make every y coordinate show when hovering over a given x coordinate
dataviz1_plotly['layout'].update(hovermode='x')

Send updated Plotly figure object to Plotly!

In [12]:
py.iplot(dataviz1_plotly, filename='dataviz1_updated', width=960, height=1120)

To learn more about Plotly's Python API

Refer to




Got Questions or Feedback?

About Plotly

Notebook styling ideas

Big thanks to


In [13]:
from IPython.display import display, HTML
import urllib2
url = 'https://raw.githubusercontent.com/plotly/python-user-guide/master/custom.css'
display(HTML(urllib2.urlopen(url).read()))