This notebook comes in response to this twitter conversation about converting a certain matplotlib figure featured in Randal S. Olson blog post (link here and embedded below).
Plotly's matplotlib converter has been upgraded in the past week and is now distributed in the new latest version of Plotly's Python package plotly 1.1.2
. These upgrades make the conversion of this particular matplotlib plot (shown below) significantly easier.
from IPython.display import IFrame
IFrame("http://www.randalolson.com/2014/06/28/"
"how-to-make-beautiful-data-visualizations-in-python-with-matplotlib/",
720, 400)
For this particular figure,
IFrame("http://www.randalolson.com/wp-content/uploads/"
"percent-bachelors-degrees-women-usa.png",
1000, 1000)
First, check the version which version of the Python API installed on your machine:
import plotly
plotly.__version__
'1.1.2'
If not the latest (version 1.1.2), upgrade using pip
:
$ pip install plotly --upgrade
Next, if you have a plotly account as well as a credentials file set up on your machine, singing in to Plotly's servers is done automatically while importing plotly.plotly
:
import plotly.plotly as py
For more info on how to sign up or sign in to Plotly, see Plotly's Python API User Guide
If more convenient, you can manually sign in to Plotly by typing: >>> py.sign_in('your_username','your_api_key')
We also make use of Plotly's tools
module in this notebook; import it here:
import plotly.tools as tls
First, remake the original matplotlib figure. The following code cell was copied verbatim from the Randal S. Olsen's blog post with the exception of the last line where we grab the underlying figure object and link it to a variable named dataviz1
.
%pylab inline
from pandas import read_csv
# Read the data into a pandas DataFrame.
gender_degree_data = read_csv("http://www.randalolson.com/wp-content/uploads/"
"percent-bachelors-degrees-women-usa.csv")
# These are the "Tableau 20" colors as RGB.
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
(44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
(148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
(227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
(188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]
# Scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.
for i in range(len(tableau20)):
r, g, b = tableau20[i]
tableau20[i] = (r / 255., g / 255., b / 255.)
# You typically want your plot to be ~1.33x wider than tall. This plot is a rare
# exception because of the number of lines being plotted on it.
# Common sizes: (10, 7.5) and (12, 9)
figure(figsize=(12, 14))
# Remove the plot frame lines. They are unnecessary chartjunk.
ax = subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
# Limit the range of the plot to only where the data is.
# Avoid unnecessary whitespace.
ylim(0, 90)
xlim(1968, 2014)
# Make sure your axis ticks are large enough to be easily read.
# You don't want your viewers squinting to read your plot.
yticks(range(0, 91, 10), [str(x) + "%" for x in range(0, 91, 10)], fontsize=14)
xticks(fontsize=14)
# Provide tick lines across the plot to help your viewers trace along
# the axis ticks. Make sure that the lines are light and small so they
# don't obscure the primary data lines.
for y in range(10, 91, 10):
plot(range(1968, 2012), [y] * len(range(1968, 2012)), "--",
lw=0.5, color="black", alpha=0.3)
# Remove the tick marks; they are unnecessary with the tick lines we just plotted.
plt.tick_params(axis="both", which="both", bottom="off", top="off",
labelbottom="on", left="off", right="off", labelleft="on")
# Now that the plot is prepared, it's time to actually plot the data!
# Note that I plotted the majors in order of the highest % in the final year.
majors = ['Health Professions', 'Public Administration', 'Education', 'Psychology',
'Foreign Languages', 'English', 'Communications\nand Journalism',
'Art and Performance', 'Biology', 'Agriculture',
'Social Sciences and History', 'Business', 'Math and Statistics',
'Architecture', 'Physical Sciences', 'Computer Science',
'Engineering']
for rank, column in enumerate(majors):
# Plot each line separately with its own color, using the Tableau 20
# color set in order.
plot(gender_degree_data.Year.values,
gender_degree_data[column.replace("\n", " ")].values,
lw=2.5, color=tableau20[rank])
# Add a text label to the right end of every line. Most of the code below
# is adding specific offsets y position because some labels overlapped.
y_pos = gender_degree_data[column.replace("\n", " ")].values[-1] - 0.5
if column == "Foreign Languages":
y_pos += 0.5
elif column == "English":
y_pos -= 0.5
elif column == "Communications\nand Journalism":
y_pos += 0.75
elif column == "Art and Performance":
y_pos -= 0.25
elif column == "Agriculture":
y_pos += 1.25
elif column == "Social Sciences and History":
y_pos += 0.25
elif column == "Business":
y_pos -= 0.75
elif column == "Math and Statistics":
y_pos += 0.75
elif column == "Architecture":
y_pos -= 0.75
elif column == "Computer Science":
y_pos += 0.75
elif column == "Engineering":
y_pos -= 0.25
# Again, make sure that all labels are large enough to be easily read
# by the viewer.
text(2011.5, y_pos, column, fontsize=14, color=tableau20[rank])
# matplotlib's title() call centers the title on the plot, but not the graph,
# so I used the text() call to customize where the title goes.
# Make the title big enough so it spans the entire plot, but don't make it
# so big that it requires two lines to show.
# Note that if the title is descriptive enough, it is unnecessary to include
# axis labels; they are self-evident, in this plot's case.
text(1995, 93, "Percentage of Bachelor's degrees conferred to women in the U.S.A."
", by major (1970-2012)", fontsize=17, ha="center")
# Always include your data source(s) and copyright notice! And for your
# data sources, tell your viewers exactly where the data came from,
# preferably with a direct link to the data. Just telling your viewers
# that you used data from the "U.S. Census Bureau" is completely useless:
# the U.S. Census Bureau provides all kinds of data, so how are your
# viewers supposed to know which data set you used?
text(1966, -8, "Data source: nces.ed.gov/programs/digest/2013menu_tables.asp"
"\nAuthor: Randy Olson (randalolson.com / @randal_olson)"
"\nNote: Some majors are missing because the historical data "
"is not available for them", fontsize=10)
# Finally, save the figure as a PNG.
# You can also save it as a PDF, JPEG, etc.
# Just change the file extension in this call.
# bbox_inches="tight" removes all the extra whitespace on the edges of your plot.
#savefig("percent-bachelors-degrees-women-usa.png", bbox_inches="tight");
# (!) Grab figure object and link it to variable (must be in same cell as figure)
dataviz1 = gcf()
Populating the interactive namespace from numpy and matplotlib
Plotly allows you to convert a matplotlib figure object (dataviz1
in our case here) into a Plotly figure with one line of code:
py.iplot_mpl(dataviz1, resize=False, filename='dataviz1', width=960, height=1120)
/usr/local/lib/python2.7/dist-packages/plotly/matplotlylib/renderer.py:506: UserWarning: Looks like the annotation(s) you are trying to draw lies/lay outside the given figure size. Therefore, the resulting Plotly figure may not be large enough to view the full text. To adjust the size of the figure, use the 'width' and 'height' keys in the Layout object. Alternatively, use the Margin object to adjust the figure's margins. /usr/local/lib/python2.7/dist-packages/plotly/tools.py:534: UserWarning: Looks like you used a newline character: '\n'. Plotly uses a subset of HTML escape characters to do things like newline (<br>), bold (<b></b>), italics (<i></i>), etc. Your newline characters have been converted to '<br>' so they will show up right on your Plotly figure!
where the resize=False
keyword argument tells Plotly to not set the figure's size to the default Plotly dimensions. Keyword argument width
and height
set the dimensions of the display box shown in this notebook.
To view the graph in a different tab, click on the data and graph button on the bottom right corner of the plot which leads you to the figure's unique URL.
While plotting, py.iplot_mpl()
spew out two warnings:
This means that the original matplotlib figure contain annotation(s) lying outside the figure's margins. Upon printing (either using savefig()
or in the matplotlib inline
) matplotlib adjusts the margins to fit all the annotations. In contrast, running show()
would yield truncation annotation(s).
So, we will have to adjust the margin slightly.
Plotly uses a subset of HTML syntax to do insert new line in strings. In version 1.1.2 of the Python API, all \n
escape sequences are converted went sent to Plotly to <br>
so that multi-line string render as desired.
Moreover, looking more closely at the Plotly figure,
Converting ticks (or the lack of ticks) is still an issue (that we are currently trying to fix), so we will have to remove them in this Python session.
The title is not exactly at the same position as on the original matplotlib figure. Therefore, we will add a Plotly title which addiontinally will make the figure's URL more descriptive.
And finally, we will make full use of Plotly's interactibility by adding hover text to the data traces.
So, first let's convert the matplotlib figure object to a Plotly figure object:
dataviz1_plotly = tls.mpl_to_plotly(dataviz1)
print dataviz1_plotly.to_string() # show plotly figure object in notebook
Figure( data=Data([ Scatter( x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...], y=[10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, ...], mode='lines', name='_line0', line=Line( color='#000000', width=0.5, dash='dash', opacity=0.3 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...], y=[20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, ...], mode='lines', name='_line1', line=Line( color='#000000', width=0.5, dash='dash', opacity=0.3 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...], y=[30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, ...], mode='lines', name='_line2', line=Line( color='#000000', width=0.5, dash='dash', opacity=0.3 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...], y=[40.0, 40.0, 40.0, 40.0, 40.0, 40.0, 40.0, 40.0, 40.0, 40.0, ...], mode='lines', name='_line3', line=Line( color='#000000', width=0.5, dash='dash', opacity=0.3 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...], y=[50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, ...], mode='lines', name='_line4', line=Line( color='#000000', width=0.5, dash='dash', opacity=0.3 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...], y=[60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, ...], mode='lines', name='_line5', line=Line( color='#000000', width=0.5, dash='dash', opacity=0.3 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...], y=[70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, ...], mode='lines', name='_line6', line=Line( color='#000000', width=0.5, dash='dash', opacity=0.3 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...], y=[80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, ...], mode='lines', name='_line7', line=Line( color='#000000', width=0.5, dash='dash', opacity=0.3 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1968.0, 1969.0, 1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975...], y=[90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, ...], mode='lines', name='_line8', line=Line( color='#000000', width=0.5, dash='dash', opacity=0.3 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[77.099999999999994, 75.5, 76.900000000000006, 77.40000000000...], mode='lines', name='_line9', line=Line( color='#1F77B4', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[68.400000000000006, 65.5, 62.600000000000001, 64.29999999999...], mode='lines', name='_line10', line=Line( color='#AEC7E8', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[74.535327580000001, 74.149203689999993, 73.554519959999993, ...], mode='lines', name='_line11', line=Line( color='#FF7F0E', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[44.399999999999999, 46.200000000000003, 47.600000000000001, ...], mode='lines', name='_line12', line=Line( color='#FFBB78', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[73.799999999999997, 73.900000000000006, 74.599999999999994, ...], mode='lines', name='_line13', line=Line( color='#2CA02C', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[65.570923429999993, 64.556485159999994, 63.664263200000001, ...], mode='lines', name='_line14', line=Line( color='#98DF8A', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[35.299999999999997, 35.5, 36.600000000000001, 38.39999999999...], mode='lines', name='_line15', line=Line( color='#D62728', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[59.700000000000003, 59.899999999999999, 60.399999999999999, ...], mode='lines', name='_line16', line=Line( color='#FF9896', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[29.088362969999999, 29.394402849999999, 29.810221049999999, ...], mode='lines', name='_line17', line=Line( color='#9467BD', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[4.2297979799999998, 5.452796685, 7.4207102200000001, 9.65360...], mode='lines', name='_line18', line=Line( color='#C5B0D5', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[36.799999999999997, 36.200000000000003, 36.100000000000001, ...], mode='lines', name='_line19', line=Line( color='#8C564B', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[9.0644389749999998, 9.5031865939999989, 10.5589621, 12.80460...], mode='lines', name='_line20', line=Line( color='#C49C94', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[38.0, 39.0, 40.200000000000003, 40.899999999999999, 41.79999...], mode='lines', name='_line21', line=Line( color='#E377C2', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[11.921005389999999, 12.003105590000001, 13.21459351, 14.7916...], mode='lines', name='_line22', line=Line( color='#F7B6D2', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[13.800000000000001, 14.9, 14.800000000000001, 16.5, 18.19999...], mode='lines', name='_line23', line=Line( color='#7F7F7F', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[13.6, 13.6, 14.9, 16.399999999999999, 18.899999999999999, 19...], mode='lines', name='_line24', line=Line( color='#C7C7C7', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ), Scatter( x=[1970.0, 1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977...], y=[0.80000000000000004, 1.0, 1.2, 1.6000000000000001, 2.2000000...], mode='lines', name='_line25', line=Line( color='#BCBD22', width=2.5, dash='solid', opacity=1 ), xaxis='x1', yaxis='y1' ) ]), layout=Layout( showlegend=False, autosize=False, width=960, height=1120, annotations=Annotations([ Annotation( x=2011.5, y=84.3, xref='x1', yref='y1', text='Health Professions', font=Font( size=14.0, color='#1F77B4' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=81.4, xref='x1', yref='y1', text='Public Administration', font=Font( size=14.0, color='#AEC7E8' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=78.93281184, xref='x1', yref='y1', text='Education', font=Font( size=14.0, color='#FF7F0E' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=76.2, xref='x1', yref='y1', text='Psychology', font=Font( size=14.0, color='#FFBB78' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=69.5, xref='x1', yref='y1', text='Foreign Languages', font=Font( size=14.0, color='#2CA02C' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=67.42673015, xref='x1', yref='y1', text='English', font=Font( size=14.0, color='#98DF8A' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=62.45, xref='x1', yref='y1', text='Communications\nand Journalism', font=Font( size=14.0, color='#D62728' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=60.45, xref='x1', yref='y1', text='Art and Performance', font=Font( size=14.0, color='#FF9896' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=58.2423969, xref='x1', yref='y1', text='Biology', font=Font( size=14.0, color='#9467BD' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=50.78718193, xref='x1', yref='y1', text='Agriculture', font=Font( size=14.0, color='#C5B0D5' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=48.95, xref='x1', yref='y1', text='Social Sciences and History', font=Font( size=14.0, color='#8C564B' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=46.93041792, xref='x1', yref='y1', text='Business', font=Font( size=14.0, color='#C49C94' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=43.35, xref='x1', yref='y1', text='Math and Statistics', font=Font( size=14.0, color='#E377C2' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=41.5234375, xref='x1', yref='y1', text='Architecture', font=Font( size=14.0, color='#F7B6D2' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=39.6, xref='x1', yref='y1', text='Physical Sciences', font=Font( size=14.0, color='#7F7F7F' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=18.45, xref='x1', yref='y1', text='Computer Science', font=Font( size=14.0, color='#C7C7C7' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=2011.5, y=16.75, xref='x1', yref='y1', text='Engineering', font=Font( size=14.0, color='#BCBD22' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ), Annotation( x=0.58616866063612849, y=1.032144227080936, xref='paper', yref='paper', text="Percentage of Bachelor's degrees conferred to women i...", font=Font( size=17.0, color='#000000' ), align='center', showarrow=False, opacity=1, xanchor='center', yanchor='bottom' ), Annotation( x=-0.043419900787855584, y=-0.088786600179005234, xref='paper', yref='paper', text='Data source: nces.ed.gov/programs/digest/2013menu_tab...', font=Font( size=10.0, color='#000000' ), align='left', showarrow=False, opacity=1, xanchor='left', yanchor='bottom' ) ]), margin=Margin( l=120, r=95, b=140, t=111, pad=0 ), hovermode='closest', xaxis1=XAxis( range=[1968.0, 2014.0], domain=[0.0, 1.0], type='linear', showgrid=False, zeroline=False, showline=False, nticks=7, ticks='inside', tickfont=Font( size=14.0 ), anchor='y1', side='bottom', mirror=False ), yaxis1=YAxis( range=[0.0, 90.0], domain=[0.0, 1.0], type='linear', showgrid=False, zeroline=False, showline=False, autotick=False, ticks='inside', tick0=0, dtick=10, tickfont=Font( size=14.0 ), anchor='x1', side='left', mirror=False ) ) )
# List of all annotation texts, show it in notebook
annos_text = [anno['text'] for anno in dataviz1_plotly['layout']['annotations']]
annos_text
['Health Professions', 'Public Administration', 'Education', 'Psychology', 'Foreign Languages', 'English', 'Communications\nand Journalism', 'Art and Performance', 'Biology', 'Agriculture', 'Social Sciences and History', 'Business', 'Math and Statistics', 'Architecture', 'Physical Sciences', 'Computer Science', 'Engineering', "Percentage of Bachelor's degrees conferred to women in the U.S.A., by major (1970-2012)", 'Data source: nces.ed.gov/programs/digest/2013menu_tables.asp\nAuthor: Randy Olson (randalolson.com / @randal_olson)\nNote: Some majors are missing because the historical data is not available for them']
# List all majors in dataset, show it in notebook
majors = annos_text[:-2]
majors
['Health Professions', 'Public Administration', 'Education', 'Psychology', 'Foreign Languages', 'English', 'Communications\nand Journalism', 'Art and Performance', 'Biology', 'Agriculture', 'Social Sciences and History', 'Business', 'Math and Statistics', 'Architecture', 'Physical Sciences', 'Computer Science', 'Engineering']
And now make a few updates on the Plotly figure object:
# (1) Adjust margins (use our web GUI to easier find the appropriate values)
dataviz1_plotly['layout']['margin'].update(
l=50, # left margin in pixels
r=160, # right " " "
b=100, # bottom " " "
t=100 # top " " "
)
# (2) Add title (appears in figure's URL, nice for sharing), remove title annotation
dataviz1_plotly['layout'].update(
title=annos_text[-2],
titlefont=dict(size=20) # increase font size
)
dataviz1_plotly['layout']['annotations'][-2].update(text=' ')
# (3) Remove tick lines
dataviz1_plotly['layout']['xaxis1'].update(ticks='')
dataviz1_plotly['layout']['yaxis1'].update(ticks='')
# (4) Add hover label to data trace, remove hover label from grid traces
N_traces = len(dataviz1_plotly['data'])
N_majors = len(majors)
update_name = [{'name': ' '} for i in range(N_traces)]
update_name[N_traces-N_majors:] = [{'name': major} for major in majors]
dataviz1_plotly['data'].update(update_name)
# (5) Make every y coordinate show when hovering over a given x coordinate
dataviz1_plotly['layout'].update(hovermode='x')
py.iplot(dataviz1_plotly, filename='dataviz1_updated', width=960, height=1120)
Refer to
About Plotly
Big thanks to
from IPython.display import display, HTML
import urllib2
url = 'https://raw.githubusercontent.com/plotly/python-user-guide/master/custom.css'
display(HTML(urllib2.urlopen(url).read()))