In [37]:

pricing = get_pricing(['AAPL', 'MSFT'], 
                      start_date='2014-01-01', 
                      end_date='2014-01-07', 
                      frequency='minute',
                      fields='price')

pricing is a DataFrame with the same structure as the return value of history on quantopian.

In [38]:

pricing.head(10)

Out[38]:

	Security(24 [AAPL])	Security(5061 [MSFT])
2014-01-02 14:31:00+00:00	79.446	37.340
2014-01-02 14:32:00+00:00	79.424	37.375
2014-01-02 14:33:00+00:00	79.490	37.260
2014-01-02 14:34:00+00:00	79.502	37.260
2014-01-02 14:35:00+00:00	79.252	37.280
2014-01-02 14:36:00+00:00	79.184	37.283
2014-01-02 14:37:00+00:00	79.260	37.270
2014-01-02 14:38:00+00:00	79.300	37.300
2014-01-02 14:39:00+00:00	79.259	37.300
2014-01-02 14:40:00+00:00	79.222	37.280

Pandas' built-in groupby and apply operations are extremely powerful. For more information on these features, see http://pandas.pydata.org/pandas-docs/stable/groupby.html.

In [39]:

from pandas.tseries.tools import normalize_date

def my_grouper(ts):
    "Function to apply to the index of the DataFrame to break it into groups."
    # Returns midnight of the supplied date.
    return normalize_date(ts)


def first_thirty_minutes(frame):
    "Function to apply to the resulting groups."
    return frame.iloc[:30]

The result of a groupby computation is a Hierarchichally-Indexed DataFrame where the outermost layer of the index is the groupby key, and the secondary layers are the values from the frame's original index.

In [40]:

data = pricing.groupby(my_grouper).apply(first_thirty_minutes)
data.head(40)

Out[40]:

		Security(24 [AAPL])	Security(5061 [MSFT])
2014-01-02 00:00:00+00:00	2014-01-02 14:31:00+00:00	79.446	37.3400
	2014-01-02 14:32:00+00:00	79.424	37.3750
	2014-01-02 14:33:00+00:00	79.490	37.2600
	2014-01-02 14:34:00+00:00	79.502	37.2600
	2014-01-02 14:35:00+00:00	79.252	37.2800
	2014-01-02 14:36:00+00:00	79.184	37.2830
	2014-01-02 14:37:00+00:00	79.260	37.2700
	2014-01-02 14:38:00+00:00	79.300	37.3000
	2014-01-02 14:39:00+00:00	79.259	37.3000
	2014-01-02 14:40:00+00:00	79.222	37.2800
	2014-01-02 14:41:00+00:00	79.187	37.2500
	2014-01-02 14:42:00+00:00	79.069	37.2500
	2014-01-02 14:43:00+00:00	79.160	37.2300
	2014-01-02 14:44:00+00:00	79.186	37.2325
	2014-01-02 14:45:00+00:00	79.213	37.1950
	2014-01-02 14:46:00+00:00	79.152	37.1601
	2014-01-02 14:47:00+00:00	79.183	37.1800
	2014-01-02 14:48:00+00:00	79.202	37.1900
	2014-01-02 14:49:00+00:00	79.159	37.1550
	2014-01-02 14:50:00+00:00	79.190	37.1700
	2014-01-02 14:51:00+00:00	79.150	37.1600
	2014-01-02 14:52:00+00:00	79.064	37.2150
	2014-01-02 14:53:00+00:00	79.134	37.2500
	2014-01-02 14:54:00+00:00	79.074	37.2400
	2014-01-02 14:55:00+00:00	79.139	37.2350
	2014-01-02 14:56:00+00:00	79.164	37.1800
	2014-01-02 14:57:00+00:00	79.163	37.1600
	2014-01-02 14:58:00+00:00	79.217	37.1800
	2014-01-02 14:59:00+00:00	79.165	37.2001
	2014-01-02 15:00:00+00:00	79.144	37.2501
2014-01-03 00:00:00+00:00	2014-01-03 14:31:00+00:00	78.937	37.1800
	2014-01-03 14:32:00+00:00	78.702	37.1700
	2014-01-03 14:33:00+00:00	78.756	37.1301
	2014-01-03 14:34:00+00:00	78.552	37.1450
	2014-01-03 14:35:00+00:00	78.573	37.1500
	2014-01-03 14:36:00+00:00	78.616	37.1400
	2014-01-03 14:37:00+00:00	78.693	37.1100
	2014-01-03 14:38:00+00:00	78.630	37.1250
	2014-01-03 14:39:00+00:00	78.589	37.0900
	2014-01-03 14:40:00+00:00	78.543	37.1040

Because our DataFrame is Hierarchically-Indexed, we can query it by our groupby keys.

In [41]:

from pandas import Timestamp
# This gives us the first thirty minutes of January 3rd.
data.loc[Timestamp('2014-01-03', tz='UTC')]

Out[41]:

	Security(24 [AAPL])	Security(5061 [MSFT])
2014-01-03 14:31:00+00:00	78.937	37.1800
2014-01-03 14:32:00+00:00	78.702	37.1700
2014-01-03 14:33:00+00:00	78.756	37.1301
2014-01-03 14:34:00+00:00	78.552	37.1450
2014-01-03 14:35:00+00:00	78.573	37.1500
2014-01-03 14:36:00+00:00	78.616	37.1400
2014-01-03 14:37:00+00:00	78.693	37.1100
2014-01-03 14:38:00+00:00	78.630	37.1250
2014-01-03 14:39:00+00:00	78.589	37.0900
2014-01-03 14:40:00+00:00	78.543	37.1040
2014-01-03 14:41:00+00:00	78.424	37.1000
2014-01-03 14:42:00+00:00	78.462	37.1400
2014-01-03 14:43:00+00:00	78.539	37.1500
2014-01-03 14:44:00+00:00	78.573	37.1600
2014-01-03 14:45:00+00:00	78.509	37.1120
2014-01-03 14:46:00+00:00	78.510	37.0600
2014-01-03 14:47:00+00:00	78.477	37.0600
2014-01-03 14:48:00+00:00	78.519	37.0600
2014-01-03 14:49:00+00:00	78.501	37.0200
2014-01-03 14:50:00+00:00	78.419	37.0100
2014-01-03 14:51:00+00:00	78.392	37.0400
2014-01-03 14:52:00+00:00	78.317	37.0450
2014-01-03 14:53:00+00:00	78.287	37.0200
2014-01-03 14:54:00+00:00	78.216	37.0300
2014-01-03 14:55:00+00:00	78.230	37.0200
2014-01-03 14:56:00+00:00	78.244	37.0150
2014-01-03 14:57:00+00:00	78.299	37.0200
2014-01-03 14:58:00+00:00	78.252	37.0301
2014-01-03 14:59:00+00:00	78.344	37.0500
2014-01-03 15:00:00+00:00	78.332	37.0372

If we want to query on the second layer of the index, we have to use .xs with a level argument instead of .loc.

Note that level=1 means the second level of the index, because the levels start at index 0.

In [42]:

data.xs(Timestamp('2014-01-03 14:58:00', tz='UTC'), level=1)

Out[42]:

	Security(24 [AAPL])	Security(5061 [MSFT])
2014-01-03 00:00:00+00:00	78.252	37.0301

If we just want to work with the original index values, we can drop the extra level from our index.

In [43]:

data_copy = data.copy()
data_copy.index = data_copy.index.droplevel(0)
data_copy.head()

Out[43]:

	Security(24 [AAPL])	Security(5061 [MSFT])
2014-01-02 14:31:00+00:00	79.446	37.340
2014-01-02 14:32:00+00:00	79.424	37.375
2014-01-02 14:33:00+00:00	79.490	37.260
2014-01-02 14:34:00+00:00	79.502	37.260
2014-01-02 14:35:00+00:00	79.252	37.280