This is just a quick exercise for you to review the various plots we showed earlier. Use df3 to replicate the following plots.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df3 = pd.read_csv('df3')
%matplotlib inline
df3.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 500 entries, 0 to 499 Data columns (total 4 columns): a 500 non-null float64 b 500 non-null float64 c 500 non-null float64 d 500 non-null float64 dtypes: float64(4) memory usage: 15.7 KB
df3.head()
a | b | c | d | |
---|---|---|---|---|
0 | 0.336272 | 0.325011 | 0.001020 | 0.401402 |
1 | 0.980265 | 0.831835 | 0.772288 | 0.076485 |
2 | 0.480387 | 0.686839 | 0.000575 | 0.746758 |
3 | 0.502106 | 0.305142 | 0.768608 | 0.654685 |
4 | 0.856602 | 0.171448 | 0.157971 | 0.321231 |
** Recreate this scatter plot of b vs a. Note the color and size of the points. Also note the figure size. See if you can figure out how to stretch it in a similar fashion. Remeber back to your matplotlib lecture...**
df3.plot.scatter(x='a',y='b',color='red',figsize=(12,3),marker='o',s=45,lw=1,edgecolors='black',xlim=(-0.2,1.2),ylim=(-0.2,1.2))
<matplotlib.axes._subplots.AxesSubplot at 0x1e336af9a20>
<matplotlib.axes._subplots.AxesSubplot at 0x1176a7da0>
** Create a histogram of the 'a' column.**
df3['a'].plot.hist(edgecolor='black',xlim=(0,1.0))
<matplotlib.axes._subplots.AxesSubplot at 0x1e337dcdc50>
<matplotlib.axes._subplots.AxesSubplot at 0x1177a2860>
** These plots are okay, but they don't look very polished. Use style sheets to set the style to 'ggplot' and redo the histogram from above. Also figure out how to add more bins to it.***
plt.style.use('ggplot')
df3['a'].plot.hist(edgecolor='white',xlim=(0,1.0),bins=25,alpha=0.5)
<matplotlib.axes._subplots.AxesSubplot at 0x1e3382d4a20>
<matplotlib.axes._subplots.AxesSubplot at 0x11a87b908>
** Create a boxplot comparing the a and b columns.**
df3[['a','b']].plot.box()
<matplotlib.axes._subplots.AxesSubplot at 0x1e338455400>
<matplotlib.axes._subplots.AxesSubplot at 0x1177c4a20>
** Create a kde plot of the 'd' column **
df3['d'].plot.kde(alpha=0.6,xticks=np.arange(-0.5,2.0,0.5))
<matplotlib.axes._subplots.AxesSubplot at 0x1e3388d1240>
<matplotlib.axes._subplots.AxesSubplot at 0x11abb6278>
** Figure out how to increase the linewidth and make the linestyle dashed. (Note: You would usually not dash a kde plot line)**
df3['d'].plot.kde(xticks=np.arange(-0.5,2.0,0.5),ls='--',lw=5)
<matplotlib.axes._subplots.AxesSubplot at 0x1e339b43518>
<matplotlib.axes._subplots.AxesSubplot at 0x11ab9acc0>
** Create an area plot of all the columns for just the rows up to 30. (hint: use .ix).**
df3.plot.area(alpha=0.3,xlim=(0,30),ylim=(0,3.0)).legend(loc='upper right')
<matplotlib.legend.Legend at 0x1e339f8d4a8>
<matplotlib.axes._subplots.AxesSubplot at 0x11ccdfbe0>
Note, you may find this really hard, reference the solutions if you can't figure it out! ** Notice how the legend in our previous figure overlapped some of actual diagram. Can you figure out how to display the legend outside of the plot as shown below?**
** Try searching Google for a good stackoverflow link on this topic. If you can't find it on your own - use this one for a hint.**
f = plt.figure()
df3.plot.area(alpha=0.3,xlim=(0,30),ylim=(0,3.0),ax=f.gca())
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.show()