#!/usr/bin/env python # coding: utf-8 # # Introduction # IPython, pandas and matplotlib have a number of useful options you can use to make it easier to view and format your data. This notebook collects a bunch of them in one place. I hope this will be a useful reference. # # The original blog posting is on http://pbpython.com/ipython-pandas-display-tips.html # ## Import modules and some sample data # First, do our standard pandas, numpy and matplotlib imports as well as configure inline displays of plots. # In[1]: import numpy as np import pandas as pd import matplotlib.pyplot as plt get_ipython().run_line_magic('matplotlib', 'inline') # One of the simple things we can do is override the default CSS to customize our DataFrame output. # # This specific example is from - [Brandon Rhodes' talk at pycon](https://www.youtube.com/watch?v=5JnMutdy6Fw "Pandas From The Ground Up") # For the purposes of the notebook, I'm defining CSS as a variable but you could easily read in from a file as well. # In[2]: CSS = """ body { margin: 0; font-family: Helvetica; } table.dataframe { border-collapse: collapse; border: none; } table.dataframe tr { border: none; } table.dataframe td, table.dataframe th { margin: 0; border: 1px solid white; padding-left: 0.25em; padding-right: 0.25em; } table.dataframe th:not(:empty) { background-color: #fec; text-align: left; font-weight: normal; } table.dataframe tr:nth-child(2) th:empty { border-left: none; border-right: 1px dashed #888; } table.dataframe td { border: 2px solid #ccf; background-color: #f4f4ff; } """ # Now add this CSS into the current notebook's HTML. # In[3]: from IPython.core.display import HTML HTML(''.format(CSS)) # In[4]: SALES=pd.read_csv("sample-sales-tax.csv", parse_dates='True') SALES.head() # You can see how the CSS is now applied to the DataFrame and how you could easily modify it to customize it to your liking. # # Jupyter notebooks do a good job of automatically displaying information but sometimes you want to force data to display. Fortunately, ipython provides and option. This is especially useful if you want to display multiple dataframes. # In[5]: from IPython.display import display # In[6]: display(SALES.head(2)) display(SALES.tail(2)) display(SALES.describe()) # ## Using pandas settings to control output # Pandas has many different options to control how data is displayed. # # You can use max_rows to control how many rows are displayed # In[7]: pd.set_option("display.max_rows",4) # In[8]: SALES # Depending on the data set, you may only want to display a smaller number of columns. # In[9]: pd.set_option("display.max_columns",6) # In[10]: SALES # You can control how many decimal points of precision to display # In[11]: pd.set_option('precision',2) # In[12]: SALES # In[13]: pd.set_option('precision',7) # In[14]: SALES # You can also format floating point numbers using float_format # In[15]: pd.set_option('float_format', '{:.2f}'.format) # In[16]: SALES # This does apply to all the data. In our example, applying dollar signs to everything would not be correct for this example. # In[17]: pd.set_option('float_format', '${:.2f}'.format) # In[18]: SALES # ## Third Party Plugins # Qtopian has a useful plugin called qgrid - https://github.com/quantopian/qgrid # # Import it and install it. # In[19]: import qgrid qgrid.nbinstall() # Showing the data is straighforward. # In[22]: qgrid.show_grid(SALES, remote_js=True) # The plugin is very similar to the capability of an Excel autofilter. It can be handy to quickly filter and sort your data. # ## Improving your plots # I have mentioned before how the default pandas plots don't look so great. Fortunately, there are style sheets in matplotlib which go a long way towards improving the visualization of your data. # # Here is a simple plot with the default values. # In[23]: SALES.groupby('name')['quantity'].sum().plot(kind="bar") # We can use some of the matplolib styles available to us to make this look better. # http://matplotlib.org/users/style_sheets.html # In[24]: plt.style.use('ggplot') # In[25]: SALES.groupby('name')['quantity'].sum().plot(kind="bar") # You can see all the styles available # In[26]: plt.style.available # In[27]: plt.style.use('bmh') # In[28]: SALES.groupby('name')['quantity'].sum().plot(kind="bar") # In[29]: plt.style.use('fivethirtyeight') # In[30]: SALES.groupby('name')['quantity'].sum().plot(kind="bar") # Each of the different styles have subtle (and not so subtle) changes. Fortunately it is easy to experiment with them and your own plots. # # # You can find other articles at [Practical Business Python](http://pbpython.com) # # This notebook is referenced in the following post - http://pbpython.com/ipython-pandas-display-tips.html