#!/usr/bin/env python # coding: utf-8 # Zipline Beginner Tutorial # ========================= # # Basics # ------ # # Zipline is an open-source algorithmic trading simulator written in Python. # # The source can be found at: https://github.com/quantopian/zipline # # Some benefits include: # # * Realistic: slippage, transaction costs, order delays. # * Stream-based: Process each event individually, avoids look-ahead bias. # * Batteries included: Common transforms (moving average) as well as common risk calculations (Sharpe). # * Developed and continuously updated by [Quantopian](https://www.quantopian.com) which provides an easy-to-use web-interface to Zipline, 10 years of minute-resolution historical US stock data, and live-trading capabilities. This tutorial is directed at users wishing to use Zipline without using Quantopian. If you instead want to get started on Quantopian, see [here](https://www.quantopian.com/faq#get-started). # # This tutorial assumes that you have zipline correctly installed, see the [installation instructions](https://github.com/quantopian/zipline#installation) if you haven't set up zipline yet. # # Every `zipline` algorithm consists of two functions you have to define: # * `initialize(context)` # * `handle_data(context, data)` # # Before the start of the algorithm, `zipline` calls the `initialize()` function and passes in a `context` variable. `context` is a persistent namespace for you to store variables you need to access from one algorithm iteration to the next. # # After the algorithm has been initialized, `zipline` calls the `handle_data()` function once for each event. At every call, it passes the same `context` variable and an event-frame called `data` containing the current trading bar with open, high, low, and close (OHLC) prices as well as volume for each stock in your universe. For more information on these functions, see the [relevant part of the Quantopian docs](https://www.quantopian.com/help#api-toplevel). # My first algorithm # ---------------------- # # Lets take a look at a very simple algorithm from the `examples` directory, `buyapple.py`: # In[1]: # assuming you're running this notebook in zipline/docs/notebooks import os if os.name == 'nt': # windows doesn't have the cat command, but uses 'type' similarly get_ipython().system(' type "..\\..\\zipline\\examples\\buyapple.py"') else: get_ipython().system(' cat ../../zipline/examples/buyapple.py') # As you can see, we first have to import some functions we would like to use. All functions commonly used in your algorithm can be found in `zipline.api`. Here we are using `order()` which takes two arguments -- a security object, and a number specifying how many stocks you would like to order (if negative, `order()` will sell/short stocks). In this case we want to order 10 shares of Apple at each iteration. For more documentation on `order()`, see the [Quantopian docs](https://www.quantopian.com/help#api-order). # # Finally, the `record()` function allows you to save the value of a variable at each iteration. You provide it with a name for the variable together with the variable itself: `varname=var`. After the algorithm finished running you will have access to each variable value you tracked with `record()` under the name you provided (we will see this further below). You also see how we can access the current price data of the AAPL stock in the `data` event frame (for more information see [here](https://www.quantopian.com/help#api-event-properties)). # ## Ingesting data for your algorithm # # Before we can run the algorithm, we'll need some historical data for our algorithm to ingest, which we can get through a data bundle. A data bundle is a collection of pricing data, adjustment data, and an asset database. Bundles allow us to preload all of the data we will need to run backtests and store the data for future runs. Quantopian provides a default bundle called `quandl` which uses the [Quandl WIKI Dataset](https://www.quandl.com/data/WIKI-Wiki-EOD-Stock-Prices). You'll need a [Quandl API Key](https://docs.quandl.com/docs#section-authentication), and then you can ingest that data by running: # In[ ]: get_ipython().system(' QUANDL_API_KEY= zipline ingest -b quandl') # For more information on data bundles, such as building custom data bundles, you can look at the [zipline docs](https://www.zipline.io/bundles.html). # ## Running the algorithm # # To now test this algorithm on financial data, `zipline` provides two interfaces. A command-line interface and an `IPython Notebook` interface. # # ### Command line interface # After you installed zipline you should be able to execute the following from your command line (e.g. `cmd.exe` on Windows, or the Terminal app on OSX): # In[24]: get_ipython().system('zipline run --help') # Note that you have to omit the preceding '!' when you call `run_algo.py`, this is only required by the IPython Notebook in which this tutorial was written. # # As you can see there are a couple of flags that specify where to find your algorithm (`-f`) as well as the time-range (`--start` and `--end`). Finally, you'll want to save the performance metrics of your algorithm so that you can analyze how it performed. This is done via the `--output` flag and will cause it to write the performance `DataFrame` in the pickle Python file format. # # Thus, to execute our algorithm from above and save the results to `buyapple_out.pickle` we would call `run_algo.py` as follows: # In[25]: get_ipython().system('zipline run -f ../../zipline/examples/buyapple.py --start 2016-1-1 --end 2018-1-1 -o buyapple_out.pickle') # `run_algo.py` first outputs the algorithm contents. It then uses historical price and volume data of Apple from the `quantopian-quandl` bundle in the desired time range, calls the `initialize()` function, and then streams the historical stock price day-by-day through `handle_data()`. After each call to `handle_data()` we instruct `zipline` to order 10 stocks of AAPL. After the call of the `order()` function, `zipline` enters the ordered stock and amount in the order book. After the `handle_data()` function has finished, `zipline` looks for any open orders and tries to fill them. If the trading volume is high enough for this stock, the order is executed after adding the commission and applying the slippage model which models the influence of your order on the stock price, so your algorithm will be charged more than just the stock price * 10. (Note, that you can also change the commission and slippage model that `zipline` uses, see the [Quantopian docs](https://www.quantopian.com/help#ide-slippage) for more information). # # Note that there is also an `analyze()` function printed. `run_algo.py` will try and look for a file with the ending with `_analyze.py` and the same name of the algorithm (so `buyapple_analyze.py`) or an `analyze()` function directly in the script. If an `analyze()` function is found it will be called *after* the simulation has finished and passed in the performance `DataFrame`. (The reason for allowing specification of an `analyze()` function in a separate file is that this way `buyapple.py` remains a valid Quantopian algorithm that you can copy&paste to the platform). # # Lets take a quick look at the performance `DataFrame`. For this, we use `pandas` from inside the IPython Notebook and print the first ten rows. Note that `zipline` makes heavy usage of `pandas`, especially for data input and outputting so it's worth spending some time to learn it. # In[26]: import pandas as pd perf = pd.read_pickle('buyapple_out.pickle') # read in perf DataFrame perf.head() # As you can see, there is a row for each trading day, starting on the first business day of 2016. In the columns you can find various information about the state of your algorithm. The very first column `AAPL` was placed there by the `record()` function mentioned earlier and allows us to plot the price of apple. For example, we could easily examine now how our portfolio value changed over time compared to the AAPL stock price. # In[27]: get_ipython().run_line_magic('pylab', 'inline') figsize(12, 12) import matplotlib.pyplot as plt ax1 = plt.subplot(211) perf.portfolio_value.plot(ax=ax1) ax1.set_ylabel('Portfolio Value') ax2 = plt.subplot(212, sharex=ax1) perf.AAPL.plot(ax=ax2) ax2.set_ylabel('AAPL Stock Price') # As you can see, our algorithm performance as assessed by the `portfolio_value` closely matches that of the AAPL stock price. This is not surprising as our algorithm only bought AAPL every chance it got. # ### IPython Notebook # # The [IPython Notebook](http://ipython.org/notebook.html) is a very powerful browser-based interface to a Python interpreter (this tutorial was written in it). As it is already the de-facto interface for most quantitative researchers `zipline` provides an easy way to run your algorithm inside the Notebook without requiring you to use the CLI. # # To use it you have to write your algorithm in a cell and let `zipline` know that it is supposed to run this algorithm. This is done via the `%%zipline` IPython magic command that is available after you run `%load_ext zipline` in a separate cell. This magic takes the same arguments as the command line interface described above. # In[28]: get_ipython().run_line_magic('load_ext', 'zipline') # In[29]: get_ipython().run_cell_magic('zipline', '--start 2016-1-1 --end 2018-1-1 -o perf_ipython.pickle', "\nfrom zipline.api import symbol, order, record\n\ndef initialize(context):\n context.asset = symbol('AAPL')\n\ndef handle_data(context, data):\n order(context.asset, 10)\n record(AAPL=data.current(context.asset, 'price'))\n") # Note that we did not have to specify an input file as above since the magic will use the contents of the cell and look for your algorithm functions there. # In[30]: pd.read_pickle('perf_ipython.pickle').head() # ## Access to previous prices using `data.history()` # # ### Working example: Dual Moving Average Cross-Over # # The Dual Moving Average (DMA) is a classic momentum strategy. It's probably not used by any serious trader anymore but is still very instructive. The basic idea is that we compute two rolling or moving averages (mavg) -- one with a longer window that is supposed to capture long-term trends and one shorter window that is supposed to capture short-term trends. Once the short-mavg crosses the long-mavg from below we assume that the stock price has upwards momentum and long the stock. If the short-mavg crosses from above we exit the positions as we assume the stock to go down further. # # As we need to have access to previous prices to implement this strategy we need a new concept: History # # `data.history()` is a convenience function that keeps a rolling window of data for you. The first argument is the asset or iterable of assets you're using, the second argument is the field you're looking for i.e. price, open, volume, the third argument is the number of bars, and the fourth argument is your frequency (either `'1d'` for `'1m'` but note that you need to have minute-level data for using `1m`). # # For a more detailed description of `data.history()`'s features, see the [Quantopian docs](https://www.quantopian.com/help#ide-history). Let's look at the strategy which should make this clear: # In[31]: get_ipython().run_line_magic('pylab', 'inline') figsize(12, 12) # In[32]: get_ipython().run_cell_magic('zipline', '--start 2014-1-1 --end 2018-1-1 -o perf_dma.pickle', '\nfrom zipline.api import order_target, record, symbol\nimport numpy as np\nimport matplotlib.pyplot as plt\n\ndef initialize(context):\n context.i = 0\n context.asset = symbol(\'AAPL\')\n\n\ndef handle_data(context, data):\n # Skip first 300 days to get full windows\n context.i += 1\n if context.i < 300:\n return\n\n # Compute averages\n # data.history() has to be called with the same params\n # from above and returns a pandas dataframe.\n short_mavg = data.history(context.asset, \'price\', bar_count=100, frequency="1d").mean()\n long_mavg = data.history(context.asset, \'price\', bar_count=300, frequency="1d").mean()\n\n # Trading logic\n if short_mavg > long_mavg:\n # order_target orders as many shares as needed to\n # achieve the desired number of shares.\n order_target(context.asset, 100)\n elif short_mavg < long_mavg:\n order_target(context.asset, 0)\n\n # Save values for later inspection\n record(AAPL=data.current(context.asset, \'price\'),\n short_mavg=short_mavg,\n long_mavg=long_mavg)\n\n\ndef analyze(context, perf):\n ax1 = plt.subplot(211)\n perf.portfolio_value.plot(ax=ax1)\n ax1.set_ylabel(\'portfolio value in $\')\n ax1.set_xlabel(\'time in years\')\n\n ax2 = plt.subplot(212, sharex=ax1)\n\n perf[\'AAPL\'].plot(ax=ax2)\n perf[[\'short_mavg\', \'long_mavg\']].plot(ax=ax2)\n\n perf_trans = perf.ix[[t != [] for t in perf.transactions]]\n buys = perf_trans.ix[[t[0][\'amount\'] > 0 for t in perf_trans.transactions]]\n sells = perf_trans.ix[[t[0][\'amount\'] < 0 for t in perf_trans.transactions]]\n ax2.plot(buys.index, perf.short_mavg.ix[buys.index], \'^\', markersize=10, color=\'m\')\n ax2.plot(sells.index, perf.short_mavg.ix[sells.index],\'v\', markersize=10, color=\'k\')\n ax2.set_ylabel(\'price in $\')\n ax2.set_xlabel(\'time in years\')\n plt.legend(loc=0)\n plt.show()\n') # Here we are explicitly defining an `analyze()` function that gets automatically called once the backtest is done (this is not possible on Quantopian currently). # # Although it might not be directly apparent, the power of `history` (pun intended) can not be under-estimated as most algorithms make use of prior market developments in one form or another. You could easily devise a strategy that trains a classifier with [`scikit-learn`](http://scikit-learn.org/stable/) which tries to predict future market movements based on past prices (note, that most of the `scikit-learn` functions require `numpy.ndarray`s rather than `pandas.DataFrame`s, so you can simply pass the underlying `ndarray` of a `DataFrame` via `.values`). # # We also used the `order_target()` function above. This and other functions like it can make order management and portfolio rebalancing much easier. See the [Quantopian documentation on order functions](https://www.quantopian.com/help#api-order-methods) fore more details. # # # Conclusions # # We hope that this tutorial gave you a little insight into the architecture, API, and features of `zipline`. For next steps, check out some of the [examples](https://github.com/quantopian/zipline/tree/master/zipline/examples). # # Feel free to ask questions on [our mailing list](https://groups.google.com/forum/#!forum/zipline), report problems on our [GitHub issue tracker](https://github.com/quantopian/zipline/issues?state=open), [get involved](https://github.com/quantopian/zipline/wiki/Contribution-Requests), and [checkout Quantopian](https://quantopian.com).