Qgrid is an Jupyter notebook widget which uses a javascript library called SlickGrid to render pandas DataFrames within a Jupyter notebook. It was developed for use in Quantopian's hosted research environment.
The purpose of this notebook is to give an overview of what qgrid is capable of. Execute the cells below to generate some qgrids using a diverse set of DataFrames.
and filter hundreds of thousands of rows with extreme responsiveness.
Qgrid renders pandas DataFrames as SlickGrids, which enables users to explore the entire contents of a DataFrame using intuitive sorting and filtering controls. It's built on the ipywidget framework and is designed to be used in Jupyter notebook, Jupyterhub, or Jupyterlab
show_grid
method. Options can be provided for all columns via the column_options
parameter, and for individual columns via the column_definitions
parameter.edit_cell
, change_selection
, toggle_editable
methods for updating the state of an existing grid widget without having to call show_grid
.add_row
method so that the caller can specify the values for the new row via the row
parameter. This will allow people to add rows to a qgrid instance even if it's showing a DataFrame that doesn't have an integer index.remove_row
method so that the indices of the rows to remove can optionally be provided via the rows
parameter.qgrid.on
) as well as on individual instances (using QgridWidget.on
)._df
attribute using the observe
method (i.e.qgrid_widget.observe(handle_df_changed, names=['_df'])
). This method will no longer work for most events (scrolling, sorting, filtering, etc) so the new QgridWidget.on
method should be used instead.API documentation is hosted on readthedocs.
The API documentation can also be accessed via the "?" operator in IPython. To use the "?" operator, type the name of the function followed by "?" to see the documentation for that function, like this:
qgrid.show_grid?
qgrid.set_defaults?
qgrid.set_grid_options?
qgrid.enable?
qgrid.disable?
import numpy as np
import pandas as pd
import qgrid
randn = np.random.randn
df_types = pd.DataFrame({
'A' : pd.Series(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08', '2013-01-09'],index=list(range(9)),dtype='datetime64[ns]'),
'B' : pd.Series(randn(9),index=list(range(9)),dtype='float32'),
'C' : pd.Categorical(["washington", "adams", "washington", "madison", "lincoln","jefferson", "hamilton", "roosevelt", "kennedy"]),
'D' : ["foo", "bar", "buzz", "bippity","boppity", "foo", "foo", "bar", "zoo"] })
df_types['E'] = df_types['D'] == 'foo'
qgrid_widget = qgrid.show_grid(df_types, show_toolbar=True)
qgrid_widget
If you make any sorting/filtering changes, or edit the grid by double clicking, you can retrieve a copy of your DataFrame which reflects these changes by calling get_changed_df
on the QgridWidget
instance returned by show_grid
.
qgrid_widget.get_changed_df()
Note: The reason for the redundant "import" statements in the next cell (and many subsequent cells) is because it allows us to run the cells in any order.
import pandas as pd
import numpy as np
import qgrid
# set the default max number of rows to 10 so the larger DataFrame we render don't take up to much space
qgrid.set_grid_option('maxVisibleRows', 10)
df_scale = pd.DataFrame(np.random.randn(1000000, 4), columns=list('ABCD'))
# duplicate column B as a string column, to test scalability for text column filters
df_scale['B (as str)'] = df_scale['B'].map(lambda x: str(x))
q_scale = qgrid.show_grid(df_scale, show_toolbar=True, grid_options={'forceFitColumns': False, 'defaultColumnWidth': 200})
q_scale
q_scale.get_changed_df()
import pandas as pd
import numpy as np
import qgrid
randn = np.random.randn
# Get a pandas DataFrame containing the daily prices for the S&P 500 from 1/1/2014 - 1/1/2017
from pandas_datareader.data import DataReader
spy = DataReader(
'SPY',
'yahoo',
pd.Timestamp('2014-01-01'),
pd.Timestamp('2017-01-01'),
)
# Tell qgrid to automatically render all DataFrames and Series as qgrids.
qgrid.enable()
# Render the DataFrame as a qgrid automatically
spy
# Disable automatic display so we can display DataFrames in the normal way
qgrid.disable()
Create a sample DataFrame using the wb.download
function and render it without using qgrid
import qgrid
import pandas as pd
from pandas_datareader import wb
df_countries = wb.download(indicator='NY.GDP.PCAP.KD', country=['all'], start=2005, end=2008)
df_countries.columns = ['GDP per capita (constant 2005 US$)']
qgrid.show_grid(df_countries)
df_countries
Create a sample DataFrame using the wb.download
function and render it without using qgrid
import numpy as np
import pandas as pd
import qgrid
td = np.cumsum(np.random.randint(1, 15*60, 1000))
start = pd.Timestamp('2017-04-17')
df_interval = pd.DataFrame(
[(start + pd.Timedelta(seconds=d)) for d in td],
columns=['time'])
freq = '15Min'
start = df_interval['time'].min().floor(freq)
end = df_interval['time'].max().ceil(freq)
bins = pd.date_range(start, end, freq=freq)
df_interval['time_bin'] = pd.cut(df_interval['time'], bins)
qgrid.show_grid(df_interval, show_toolbar=True)
df_interval
Create a sample DataFrame using the wb.download
function and render it without using qgrid
import numpy as np
import pandas as pd
import qgrid
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
df_multi = pd.DataFrame(np.random.randn(8, 4), index=arrays)
qgrid.show_grid(df_multi, show_toolbar=True)
df_multi
Create a sample DataFrame with only two columns using randint
, and render it in a Layout widget that's 20% of the width of the output area.
import numpy as np
import pandas as pd
import qgrid
import ipywidgets as ipyw
randn = np.random.randn
df_types = pd.DataFrame(np.random.randint(1,14,14))
qgrid_widget = qgrid.show_grid(df_types, show_toolbar=False)
qgrid_widget.layout = ipyw.Layout(width='20%')
qgrid_widget
import pandas as pd
import qgrid
df = pd.DataFrame({'A': [1.2, 'xy', 4], 'B': [3, 4, 5]})
df = df.set_index(pd.Index(['yz', 7, 3.2]))
view = qgrid.show_grid(df)
view
import pandas as pd
import qgrid
range_index = pd.period_range(start='2000', periods=10, freq='B')
df = pd.DataFrame({'a': 5, 'b': range_index}, index=range_index)
view = qgrid.show_grid(df)
view
import pandas as pd
import numpy as np
import qgrid
df = pd.DataFrame([(pd.Timestamp('2017-02-02'), None, 3.4), (np.nan, 2, 4.7), (pd.Timestamp('2017-02-03'), 3, None)])
qgrid.show_grid(df)