Zipline Refresh

A high-performance Pythonic backtesting engine for algorithmic trading strategies

Zipline is a Pythonic event-driven system for backtesting, originally developed by Quantopian. This Refresh fork modernizes the storage layer, eliminates legacy dependencies, and delivers significant performance improvements.

Documentation · PyPI · Website · Report Bug

What's New in Refresh

Phase 1: bcolz → Apache Parquet

The legacy bcolz storage layer has been fully replaced with Apache Parquet via PyArrow:

	bcolz (legacy)	Parquet (new)
Format	Custom binary + Cython	Standard columnar, zstd compressed
Daily bars	One ctable per field	Single `.parquet` file per bundle
Minute bars	Fixed-stride padding + Cython position math	Actual trading minutes only — no padding
Dependencies	bcolz (unmaintained, build failures on Python 3.12+)	pyarrow (actively maintained)
Data types	uint32 (lossy for prices > $42,949)	float64 (full precision)
Interoperability	Proprietary format	Standard Parquet — readable by pandas, Spark, DuckDB
Compression	None / blosc	zstd (2-5x smaller on disk)
Early close handling	Complex Cython exclusion logic	Eliminated — only real trading minutes stored

Phase 2: Profiling-Driven Hot Path Optimization

Systematic profiling (50 assets, 780 bars/session) identified and eliminated bottlenecks across the entire data layer:

Optimization	Speedup	Detail
bcolz → Parquet migration	N/A	Eliminated unmaintained dependency, Cython position math, uint32 truncation
Lazy per-field loading	3.2x single field	Load only requested OHLCV fields instead of all 5 at once
Vectorized lifetimes	5x	Replace per-sid Python loop with single `pd.DataFrame` construction
Batch resample aggregation	5x	Batch `load_raw_arrays` in DailyHistoryAggregator instead of per-field calls
NumPy int64 searchsorted	40x per lookup	Replace `DatetimeIndex.get_loc()` (~4.3µs) with `np.searchsorted` on int64 (~0.1µs)
Vectorized last-traded	17x	`np.flatnonzero` on volume array instead of Python backward scan

Net result: pandas DatetimeIndex overhead reduced from 46% → 6.5% of hot-path time. Per-bar latency 0.6ms → 0.3ms.

Benchmark details (50 assets x 780 bars)

Before (bcolz baseline → initial Parquet):
  pandas DatetimeIndex             46.0%  ██████████████████████████████████████████████
  get_value (reader)               13.0%  █████████████
  memoize/lazyval                  10.0%  ██████████

After (fully optimized Parquet):
  pandas DatetimeIndex              6.5%  ██████
  get_value (reader)               26.7%  ██████████████████████████
  memoize/lazyval                  12.9%  ████████████
  numpy operations                 12.2%  ████████████

Total hot-path time: 0.44s → 0.24s (1.8x faster)
Per-bar latency: 0.6ms → 0.3ms

Micro-benchmarks (500 sids x 1000 days):

Single field load: 65.5ms → 20.6ms (3.2x)
get_last_traded_dt: 3.4ms → 0.2ms (17x)
_lifetimes_map: 5.5ms → 1.1ms (5x)
Sequential get_value: 68.5ms → 23.1ms (3.0x)

Features

Event-Driven Architecture — Realistic simulation with proper order lifecycle, slippage, and commission models
Pipeline API — Factor-based screening with 20+ built-in technical factors (RSI, MACD, Bollinger, Ichimoku, etc.) and easy CustomFactor extensibility
Factor Composition — rank(), zscore(), demean(), winsorize(), top(N) with groupby for sector-neutral strategies
PyData Integration — pandas DataFrames in/out, compatible with matplotlib, scipy, statsmodels, scikit-learn
Multi-Country Support — 42 country domains with proper trading calendars via exchange_calendars
Minute & Daily Resolution — Full minute-level backtesting with proper market open/close handling

Installation

Zipline supports Python >= 3.10 and is compatible with current versions of NumFOCUS libraries.

Using `pip`

pip install zipline-refresh

From source

git clone https://github.com/teleclaws/zipline-refresh.git
cd zipline-refresh
pip install -e .

See the documentation for detailed instructions.

Quickstart

Example 1: RSI Long/Short Pipeline Strategy

Use the Pipeline API to rank stocks by RSI and build a long/short portfolio — rebalanced daily:

from zipline.api import attach_pipeline, order_target_percent, pipeline_output, schedule_function
from zipline.finance import commission, slippage
from zipline.pipeline import Pipeline
from zipline.pipeline.factors import RSI


def make_pipeline():
    rsi = RSI()
    return Pipeline(
        columns={"longs": rsi.top(3), "shorts": rsi.bottom(3)},
    )


def initialize(context):
    attach_pipeline(make_pipeline(), "my_pipeline")
    schedule_function(rebalance)
    context.set_commission(commission.PerShare(cost=0.001, min_trade_cost=1.0))
    context.set_slippage(slippage.VolumeShareSlippage())


def before_trading_start(context, data):
    context.pipeline_data = pipeline_output("my_pipeline")


def rebalance(context, data):
    pipeline_data = context.pipeline_data
    longs = pipeline_data.index[pipeline_data.longs]
    shorts = pipeline_data.index[pipeline_data.shorts]

    for asset in longs:
        order_target_percent(asset, 1.0 / 3.0)
    for asset in shorts:
        order_target_percent(asset, -1.0 / 3.0)

    for asset in context.portfolio.positions:
        if asset not in longs and asset not in shorts and data.can_trade(asset):
            order_target_percent(asset, 0)

Example 2: Multi-Factor Ranking

Combine multiple factors with ranking and normalization:

from zipline.pipeline import Pipeline
from zipline.pipeline.factors import AverageDollarVolume, Returns, RSI


def make_pipeline():
    # Factor definitions
    momentum = Returns(window_length=20).rank()
    mean_reversion = -Returns(window_length=5).rank()
    rsi_signal = RSI().rank()

    # Composite score (equal-weighted)
    composite = (momentum + mean_reversion + rsi_signal).rank()

    # Liquidity filter
    liquid = AverageDollarVolume(window_length=30).top(100)

    return Pipeline(
        columns={
            "score": composite,
            "longs": composite.top(10, mask=liquid),
            "shorts": composite.bottom(10, mask=liquid),
        },
        screen=liquid,
    )

Data Ingestion

Zipline supports CSV-based data bundles for any market:

# In ~/.zipline/extension.py
from zipline.data.bundles import register
from zipline.data.bundles.csvdir import csvdir_equities

register(
    "my-data",
    csvdir_equities(["daily"], "/path/to/csv/dir"),
    calendar_name="XNYS",
)

# Ingest and run
zipline ingest -b my-data
zipline run -f strategy.py --start 2020-1-1 --end 2024-1-1 -o results.pickle --no-benchmark -b my-data

More examples in the examples directory.

Tech Stack

Component	Version	Notes
Python	>= 3.10	Tested on 3.10 – 3.13
pandas	>= 1.3	Full NumPy 2.0 support with pandas >= 2.2.2
NumPy	>= 1.23	NumPy 2.x compatible
PyArrow	>= 14.0	Parquet I/O with zstd compression
SQLAlchemy	>= 2.0	Asset metadata & adjustment storage
exchange_calendars	>= 4.2	42 global market calendars
Cython	>= 0.29	Performance-critical components

Contributing

This project is sponsored by Kavout. Built upon the work of Quantopian/zipline and stefan-jansen/zipline-reloaded.

Found a bug or have a suggestion? Open an issue.

License

Apache 2.0. See LICENSE for details.

Zipline Refresh

Zipline Refresh

On this page