Tutorial#

Zipline is an open-source algorithmic trading simulator written in Python.

Some benefits include:

Realistic: slippage, transaction costs, order delays.
Stream-based: Process each event individually, avoids look-ahead bias.
Batteries included: Common transforms (moving average) as well as common risk calculations (Sharpe) can be computed efficiently while executing a backtest.

This tutorial assumes that you have Zipline correctly installed, see the Installation instructions if you haven’t done so yet.

How to construct an algorithm#

Every Zipline algorithm consists of two functions you have to define:

initialize(context)
handle_data(context, data)

Before the start of the algorithm, Zipline calls the initialize() function and passes in a context variable. context is a persistent namespace for you to store variables you need to access from one algorithm iteration to the next.

After the algorithm has been initialized, Zipline calls the handle_data() function once for each event. At every call, it passes the same context variable and an event-frame called data containing the current trading bar with open, high, low, and close (OHLC) prices as well as volume for each stock in your universe.

A simple example#

Let’s take a look at a very simple algorithm from the zipline/examples directory, buyapple.py. Each period, which is a trading day, it orders 10 shares of the Apple stock and records the price.

from zipline.examples import buyapple

from zipline.api import order, record, symbol


def initialize(context):
    pass


def handle_data(context, data):
    order(symbol('AAPL'), 10)
    record(AAPL=data.current(symbol('AAPL'), 'price'))

As you can see, we first have to import some functions we would like to use. All functions commonly used in your algorithm can be found in zipline.api. Here we are using order() which takes two arguments: a security object, and a number specifying how many stocks you would like to order (if negative, order() will sell/short stocks). In this case we want to order 10 shares of Apple at each iteration.

Finally, the record() function allows you to save the value of a variable at each iteration. You provide it with a name for the variable together with the variable itself: varname=var. After the algorithm finished running you will have access to each variable value you tracked with record() under the name you provided (we will see this further below). You also see how we can access the current price data of the AAPL stock in the data event frame.

How to run the algorithm#

To now test this algorithm on financial data, Zipline provides three interfaces: 1. the command line via the zipline command, 2. the Jupyter Notebook via the zipline magic, and 3. the run_algorithm() to execute your algo like any Python script, for example in your IDE.

Before we can run any algorithms, we need some data.

Ingesting Data#

If you haven’t ingested data yet, then run:

$ zipline ingest -b <bundle>

where <bundle> is the name of the bundle to ingest. You can use the default quandl for now to work with the Quandl WIKI price data. Check out the ingesting data section for more detail on how to obtain other new data.

Command Line Interface#

After you installed Zipline you should be able to execute the following from your command line (e.g. cmd.exe on Windows, the Terminal app on OSX, or e.g. the bash shell on Linux):

$ zipline run --help

Usage: zipline run [OPTIONS]

Run a backtest for the given algorithm.

Options:
 -f, --algofile FILENAME         The file that contains the algorithm to run.
 -t, --algotext TEXT             The algorithm script to run.
 -D, --define TEXT               Define a name to be bound in the namespace
                                 before executing the algotext. For example
                                 '-Dname=value'. The value may be any python
                                 expression. These are evaluated in order so
                                 they may refer to previously defined names.
 --data-frequency [daily|minute]
                                 The data frequency of the simulation.
                                 [default: daily]
 --capital-base FLOAT            The starting capital for the simulation.
                                 [default: 10000000.0]
 -b, --bundle BUNDLE-NAME        The data bundle to use for the simulation.
                                 [default: quandl]
 --bundle-timestamp TIMESTAMP    The date to lookup data on or before.
                                 [default: <current-time>]
 -s, --start DATE                The start date of the simulation.
 -e, --end DATE                  The end date of the simulation.
 -o, --output FILENAME           The location to write the perf data. If this
                                 is '-' the perf will be written to stdout.
                                 [default: -]
 --trading-calendar TRADING-CALENDAR
                                 The calendar you want to use e.g. LSE. NYSE
                                 is the default.
 --print-algo / --no-print-algo  Print the algorithm to stdout.
 --benchmark-file                The csv file that contains the benchmark
                                 returns (date, returns columns)
 --benchmark-symbol              The instrument's symbol to be used as
                                 a benchmark.
                                 (should exist in the ingested bundle)
 --benchmark-sid                 The sid of the instrument to be used as a
                                 benchmark.
                                 (should exist in the ingested bundle)
 --no-benchmark                  This flag is used to set the benchmark to
                                 zero. Alpha, beta and benchmark metrics
                                 are not calculated
 --help                          Show this message and exit.

As you can see there are a couple of flags that specify where to find your algorithm (-f) as well as parameters specifying which data to use, defaulting to quandl.

There are also arguments for the date range to run the algorithm over (--start and --end).To use a benchmark, you need to choose one of the benchmark options listed before. You can always use the option (--no-benchmark) that uses zero returns as a benchmark ( alpha, beta and benchmark metrics are not calculated in this case).

Finally, you’ll want to save the performance metrics of your algorithm so that you can analyze how it performed. This is done via the --output flag and will cause it to write the performance DataFrame in the pickle Python file format. Note that you can also define a configuration file with these parameters that you can then conveniently pass to the -c option so that you don’t have to supply the command line args all the time (see the .conf files in the examples directory).

Thus, to execute our algorithm from above and save the results to buyapple_out.pickle, we call zipline run as follows:

zipline run -f ../zipline/examples/buyapple.py --start 2016-1-1 --end 2018-1-1 -o buyapple_out.pickle --no-benchmark

AAPL
[2018-01-03 04:30:51.843465] INFO: Performance: Simulated 503 trading days out of 503.
[2018-01-03 04:30:51.843598] INFO: Performance: first open: 2016-01-04 14:31:00+00:00
[2018-01-03 04:30:51.843672] INFO: Performance: last close: 2017-12-29 21:00:00+00:00

run first calls the initialize() function, and then streams the historical stock price day-by-day through handle_data(). After each call to handle_data() we instruct zipline to order 10 stocks of AAPL. After the call of the order() function, zipline enters the ordered stock and amount in the order book. After the handle_data() function has finished, zipline looks for any open orders and tries to fill them. If the trading volume is high enough for this stock, the order is executed after adding the commission and applying the slippage model which models the influence of your order on the stock price, so your algorithm will be charged more than just the stock price * 10. (Note, that you can also change the commission and slippage model that zipline uses, see the.

Let’s take a quick look at the performance DataFrame. For this, we use pandas from inside the IPython Notebook and print the first ten rows. Note that zipline makes heavy usage of pandas, especially for data input and outputting so it’s worth spending some time to learn it.

import pandas as pd
perf = pd.read_pickle('buyapple_out.pickle') # read in perf DataFrame
perf.head()

	AAPL	algo_volatility	algorithm_period_return	alpha	benchmark_period_return	benchmark_volatility	beta	capital_used	ending_cash	ending_exposure	ending_value	gross_leverage	long_exposure	long_value	longs_count	max_drawdown	net_leverage	orders	period_close	period_label	period_open	pnl	portfolio_value	positions	returns	sharpe	sortino	starting_cash	starting_exposure	starting_value	trading_days	transactions
2016-01-04 21:00:00+00:00	105.35	NaN	0.000000e+00	NaN	-0.013983	NaN	NaN	0.0	10000000.0	0.0	0.0	0.000000	0.0	0.0	0	0.000000e+00	0.000000	[{\'dt\': 2016-01-04 21:00:00+00:00, \'reason\': N...	2016-01-04 21:00:00+00:00	2016-01	2016-01-04 14:31:00+00:00	0.0	10000000.0	[]	0.000000e+00	NaN	NaN	10000000.0	0.0	0.0	1	[]
2016-01-05 21:00:00+00:00	102.71	0.000001	-1.000000e-07	-0.000022	-0.012312	0.175994	-0.000006	-1028.1	9998971.9	1027.1	1027.1	0.000103	1027.1	1027.1	1	-1.000000e-07	0.000103	[{\'dt\': 2016-01-05 21:00:00+00:00, \'reason\': N...	2016-01-05 21:00:00+00:00	2016-01	2016-01-05 14:31:00+00:00	-1.0	9999999.0	[{\'sid\': Equity(8 [AAPL]), \'last_sale_price\': ...	-1.000000e-07	-11.224972	-11.224972	10000000.0	0.0	0.0	2	[{\'order_id\': \'4011063b5c094e82a5391527044098b...
2016-01-06 21:00:00+00:00	100.70	0.000019	-2.210000e-06	-0.000073	-0.024771	0.137853	0.000054	-1008.0	9997963.9	2014.0	2014.0	0.000201	2014.0	2014.0	1	-2.210000e-06	0.000201	[{\'dt\': 2016-01-06 21:00:00+00:00, \'reason\': N...	2016-01-06 21:00:00+00:00	2016-01	2016-01-06 14:31:00+00:00	-21.1	9999977.9	[{\'sid\': Equity(8 [AAPL]), \'last_sale_price\': ...	-2.110000e-06	-9.823839	-9.588756	9998971.9	1027.1	1027.1	3	[{\'order_id\': \'3bf9fe20cc46468d99f741474226c03...
2016-01-07 21:00:00+00:00	96.45	0.000064	-1.081000e-05	0.000243	-0.048168	0.167868	0.000300	-965.5	9996998.4	2893.5	2893.5	0.000289	2893.5	2893.5	1	-1.081000e-05	0.000289	[{\'dt\': 2016-01-07 21:00:00+00:00, \'reason\': N...	2016-01-07 21:00:00+00:00	2016-01	2016-01-07 14:31:00+00:00	-86.0	9999891.9	[{\'sid\': Equity(8 [AAPL]), \'last_sale_price\': ...	-8.600019e-06	-10.592737	-9.688947	9997963.9	2014.0	2014.0	4	[{\'order_id\': \'6af6aed9fbb44a6bba17e802051b94d...
2016-01-08 21:00:00+00:00	96.96	0.000063	-9.380000e-06	0.000466	-0.058601	0.145654	0.000311	-970.6	9996027.8	3878.4	3878.4	0.000388	3878.4	3878.4	1	-1.081000e-05	0.000388	[{\'dt\': 2016-01-08 21:00:00+00:00, \'reason\': N...	2016-01-08 21:00:00+00:00	2016-01	2016-01-08 14:31:00+00:00	14.3	9999906.2	[{\'sid\': Equity(8 [AAPL]), \'last_sale_price\': ...	1.430015e-06	-7.511729	-7.519659	9996998.4	2893.5	2893.5	5	[{\'order_id\': \'18f64975732449a18fca06e9c69bf5c...

As you can see, there is a row for each trading day, starting on the first business day of 2016. In the columns you can find various information about the state of your algorithm. The very first column AAPL was placed there by the record() function mentioned earlier and allows us to plot the price of apple. For example, we could easily examine now how our portfolio value changed over time compared to the AAPL stock price.

%pylab inline
figsize(12, 12)
import matplotlib.pyplot as plt

ax1 = plt.subplot(211)
perf.portfolio_value.plot(ax=ax1)
ax1.set_ylabel('Portfolio Value')
ax2 = plt.subplot(212, sharex=ax1)
perf.AAPL.plot(ax=ax2)
ax2.set_ylabel('AAPL Stock Price')

Populating the interactive namespace from numpy and matplotlib

<matplotlib.text.Text at 0x10c48c198>

As you can see, our algorithm performance as assessed by the portfolio_value closely matches that of the AAPL stock price. This is not surprising as our algorithm only bought AAPL every chance it got.

Jupyter Notebook#

The Jupyter Notebook is a very powerful browser-based interface to a Python interpreter (this tutorial was written in it). As it is a very popular interface for many quantitative researchers, Zipline provides an easy way to run your algorithm inside the Notebook without requiring you to use the CLI.

To use it you have to write your algorithm in a cell and let Zipline know that it is supposed to run this algorithm. This is done via the %%zipline IPython magic command that is available after you import zipline from within the IPython Notebook. This magic takes the same arguments as the command line interface described above. Thus to run the algorithm from above with the same parameters we just have to execute the following cell after importing zipline to register the magic.

%load_ext zipline

%%zipline --start 2016-1-1 --end 2018-1-1
from zipline.api import symbol, order, record

def initialize(context):
    pass

def handle_data(context, data):
    order(symbol('AAPL'), 10)
    record(AAPL=data.current(symbol('AAPL'), "price")

Note that we did not have to specify an input file as above since the magic will use the contents of the cell and look for your algorithm functions there. Also, instead of defining an output file we are specifying a variable name with -o that will be created in the name space and contain the performance DataFrame we looked at above.

_.head()

	AAPL	algo_volatility	algorithm_period_return	alpha	benchmark_period_return	benchmark_volatility	beta	capital_used	ending_cash	ending_exposure	ending_value	gross_leverage	long_exposure	long_value	longs_count	max_drawdown	net_leverage	orders	period_close	period_label	period_open	pnl	portfolio_value	positions	returns	sharpe	sortino	starting_cash	starting_exposure	starting_value	trading_days	transactions
2016-01-04 21:00:00+00:00	105.35	NaN	0.000000e+00	NaN	-0.013983	NaN	NaN	0.00	10000000.00	0.0	0.0	0.000000	0.0	0.0	0	0.000000e+00	0.000000	[{\'created\': 2016-01-04 21:00:00+00:00, \'reaso...	2016-01-04 21:00:00+00:00	2016-01	2016-01-04 14:31:00+00:00	0.00	10000000.00	[]	0.000000e+00	NaN	NaN	10000000.00	0.0	0.0	1	[]
2016-01-05 21:00:00+00:00	102.71	1.122497e-08	-1.000000e-09	-2.247510e-07	-0.012312	0.175994	-6.378047e-08	-1027.11	9998972.89	1027.1	1027.1	0.000103	1027.1	1027.1	1	-9.999999e-10	0.000103	[{\'created\': 2016-01-04 21:00:00+00:00, \'reaso...	2016-01-05 21:00:00+00:00	2016-01	2016-01-05 14:31:00+00:00	-0.01	9999999.99	[{\'amount\': 10, \'cost_basis\': 102.711000000000...	-1.000000e-09	-11.224972	-11.224972	10000000.00	0.0	0.0	2	[{\'dt\': 2016-01-05 21:00:00+00:00, \'order_id\':...
2016-01-06 21:00:00+00:00	100.70	1.842654e-05	-2.012000e-06	-4.883861e-05	-0.024771	0.137853	5.744807e-05	-1007.01	9997965.88	2014.0	2014.0	0.000201	2014.0	2014.0	1	-2.012000e-06	0.000201	[{\'created\': 2016-01-05 21:00:00+00:00, \'reaso...	2016-01-06 21:00:00+00:00	2016-01	2016-01-06 14:31:00+00:00	-20.11	9999979.88	[{\'amount\': 20, \'cost_basis\': 101.706000000000...	-2.011000e-06	-9.171989	-9.169708	9998972.89	1027.1	1027.1	3	[{\'dt\': 2016-01-06 21:00:00+00:00, \'order_id\':...
2016-01-07 21:00:00+00:00	96.45	6.394658e-05	-1.051300e-05	2.633450e-04	-0.048168	0.167868	3.005102e-04	-964.51	9997001.37	2893.5	2893.5	0.000289	2893.5	2893.5	1	-1.051300e-05	0.000289	[{\'created\': 2016-01-06 21:00:00+00:00, \'reaso...	2016-01-07 21:00:00+00:00	2016-01	2016-01-07 14:31:00+00:00	-85.01	9999894.87	[{\'amount\': 30, \'cost_basis\': 99.9543333333335...	-8.501017e-06	-10.357397	-9.552189	9997965.88	2014.0	2014.0	4	[{\'dt\': 2016-01-07 21:00:00+00:00, \'order_id\':...
2016-01-08 21:00:00+00:00	96.96	6.275294e-05	-8.984000e-06	4.879306e-04	-0.058601	0.145654	3.118401e-04	-969.61	9996031.76	3878.4	3878.4	0.000388	3878.4	3878.4	1	-1.051300e-05	0.000388	[{\'created\': 2016-01-07 21:00:00+00:00, \'reaso...	2016-01-08 21:00:00+00:00	2016-01	2016-01-08 14:31:00+00:00	15.29	9999910.16	[{\'amount\': 40, \'cost_basis\': 99.2060000000002...	1.529016e-06	-7.215497	-7.301134	9997001.37	2893.5	2893.5	5	[{\'dt\': 2016-01-08 21:00:00+00:00, \'order_id\':...

IDE via `run_algorithm()`#

To execute an algorithm like a Python script in your favorite IDE, use the run_algorithm() (see API Reference).

To adapt the buyapple.py example from above (see buyapple_ide.py in the same directory), simply add the following:

from zipline import run_algorithm
import pandas as pd
import pandas_datareader.data as web

def initialize(context):
    ...

def handle_data(context, data):
    ...

start = pd.Timestamp('2014')
end = pd.Timestamp('2018')

sp500 = web.DataReader('SP500', 'fred', start, end).SP500
benchmark_returns = sp500.pct_change()

result = run_algorithm(start=start.tz_localize('UTC'),
                       end=end.tz_localize('UTC'),
                       initialize=initialize,
                       handle_data=handle_data,
                       capital_base=100000,
                       benchmark_returns=benchmark_returns,
                       bundle='quandl',
                       data_frequency='daily')

We pass the key algo parameters to run_algorithm(), including some benchmark data for the S&P 500 that we download from the Federal Reserve Economic Data Service (available for the last 10 years).

The result return value contains the same DataFrame as in the previous example. Instead of defining an analyze() function as part of the algorithm, you can apply your preferred logic to this DataFrame.

How to use historical prices: a dual Moving Average Cross-Over example#

The Dual Moving Average (DMA) is a classic momentum strategy. It’s probably not used by any serious trader anymore but is still very instructive. The basic idea is that we compute two rolling or moving averages (mavg) – one with a longer window that is supposed to capture long-term trends and one shorter window that is supposed to capture short-term trends. Once the short-mavg crosses the long-mavg from below we assume that the stock price has upwards momentum and long the stock. If the short-mavg crosses from above we exit the positions as we assume the stock to go down further.

As we need to have access to previous prices to implement this strategy we need a new concept: History

data.history() is a convenience function that keeps a rolling window of data for you. The first argument is the number of bars you want to collect, the second argument is the unit (either '1d' or '1m', but note that you need to have minute-level data for using 1m). For a more detailed description of history()’s features, see the API Reference. Let’s look at the strategy which should make this clear:

%%zipline --start 2014-1-1 --end 2018-1-1 -o dma.pickle


from zipline.api import order_target, record, symbol
import matplotlib.pyplot as plt

def initialize(context):
    context.i = 0
    context.asset = symbol('AAPL')


def handle_data(context, data):
    # Skip first 300 days to get full windows
    context.i += 1
    if context.i < 300:
        return

    # Compute averages
    # data.history() has to be called with the same params
    # from above and returns a pandas dataframe.
    short_mavg = data.history(context.asset, 'price', bar_count=100, frequency="1d").mean()
    long_mavg = data.history(context.asset, 'price', bar_count=300, frequency="1d").mean()

    # Trading logic
    if short_mavg > long_mavg:
        # order_target orders as many shares as needed to
        # achieve the desired number of shares.
        order_target(context.asset, 100)
    elif short_mavg < long_mavg:
        order_target(context.asset, 0)

    # Save values for later inspection
    record(AAPL=data.current(context.asset, 'price'),
           short_mavg=short_mavg,
           long_mavg=long_mavg)


def analyze(context, perf):
    fig = plt.figure()
    ax1 = fig.add_subplot(211)
    perf.portfolio_value.plot(ax=ax1)
    ax1.set_ylabel('portfolio value in $')

    ax2 = fig.add_subplot(212)
    perf['AAPL'].plot(ax=ax2)
    perf[['short_mavg', 'long_mavg']].plot(ax=ax2)

    perf_trans = perf.loc[[t != [] for t in perf.transactions]]
    buys = perf_trans.loc[[t[0]['amount'] > 0 for t in perf_trans.transactions]]
    sells = perf_trans.loc[
        [t[0]['amount'] < 0 for t in perf_trans.transactions]]
    ax2.plot(buys.index, perf.short_mavg.loc[buys.index],
             '^', markersize=10, color='m')
    ax2.plot(sells.index, perf.short_mavg.loc[sells.index],
             'v', markersize=10, color='k')
    ax2.set_ylabel('price in $')
    plt.legend(loc=0)
    plt.show()

Here we are explicitly defining an analyze() function that gets automatically called once the backtest is done.

Although it might not be directly apparent, the power of history() (pun intended) can not be under-estimated as most algorithms make use of prior market developments in one form or another. You could easily devise a strategy that trains a classifier with scikit-learn which tries to predict future market movements based on past prices (note, that most of the scikit-learn functions require numpy.ndarrays rather than pandas.DataFrames, so you can simply pass the underlying ndarray of a DataFrame via .to_numpy()).

We also used the order_target() function above. This and other functions like it can make order management and portfolio rebalancing much easier. See the API Reference for more details.

Conclusions#

We hope that this tutorial gave you a little insight into the architecture, API, and features of zipline. For next steps, check out some of the examples.

Feel free to ask questions on our mailing list, report problems on our GitHub issue tracker, or get involved.