Zipline Refresh
Zipline Refresh
A high-performance Pythonic backtesting engine for algorithmic trading strategies
Python 3.10+ · PyPI · Tests · License
Zipline is a Pythonic event-driven system for backtesting, originally developed by Quantopian. This Refresh fork modernizes the storage layer, eliminates legacy dependencies, and delivers significant performance improvements.
Documentation · PyPI · Website · Report Bug
What's New in Refresh
Phase 1: bcolz → Apache Parquet
The legacy bcolz storage layer has been fully replaced with Apache Parquet via PyArrow:
| bcolz (legacy) | Parquet (new) | |
|---|---|---|
| Format | Custom binary + Cython | Standard columnar, zstd compressed |
| Daily bars | One ctable per field | Single .parquet file per bundle |
| Minute bars | Fixed-stride padding + Cython position math | Actual trading minutes only — no padding |
| Dependencies | bcolz (unmaintained, build failures on Python 3.12+) | pyarrow (actively maintained) |
| Data types | uint32 (lossy for prices > $42,949) | float64 (full precision) |
| Interoperability | Proprietary format | Standard Parquet — readable by pandas, Spark, DuckDB |
| Compression | None / blosc | zstd (2-5x smaller on disk) |
| Early close handling | Complex Cython exclusion logic | Eliminated — only real trading minutes stored |
Phase 2: Profiling-Driven Hot Path Optimization
Systematic profiling (50 assets, 780 bars/session) identified and eliminated bottlenecks across the entire data layer:
| Optimization | Speedup | Detail |
|---|---|---|
| bcolz → Parquet migration | N/A | Eliminated unmaintained dependency, Cython position math, uint32 truncation |
| Lazy per-field loading | 3.2x single field | Load only requested OHLCV fields instead of all 5 at once |
| Vectorized lifetimes | 5x | Replace per-sid Python loop with single pd.DataFrame construction |
| Batch resample aggregation | 5x | Batch load_raw_arrays in DailyHistoryAggregator instead of per-field calls |
| NumPy int64 searchsorted | 40x per lookup | Replace DatetimeIndex.get_loc() (~4.3µs) with np.searchsorted on int64 (~0.1µs) |
| Vectorized last-traded | 17x | np.flatnonzero on volume array instead of Python backward scan |
Net result: pandas DatetimeIndex overhead reduced from 46% → 6.5% of hot-path time. Per-bar latency 0.6ms → 0.3ms.
Benchmark details (50 assets x 780 bars)
Before (bcolz baseline → initial Parquet):
pandas DatetimeIndex 46.0% ██████████████████████████████████████████████
get_value (reader) 13.0% █████████████
memoize/lazyval 10.0% ██████████
After (fully optimized Parquet):
pandas DatetimeIndex 6.5% ██████
get_value (reader) 26.7% ██████████████████████████
memoize/lazyval 12.9% ████████████
numpy operations 12.2% ████████████
Total hot-path time: 0.44s → 0.24s (1.8x faster)
Per-bar latency: 0.6ms → 0.3msMicro-benchmarks (500 sids x 1000 days):
- Single field load: 65.5ms → 20.6ms (3.2x)
- get_last_traded_dt: 3.4ms → 0.2ms (17x)
- _lifetimes_map: 5.5ms → 1.1ms (5x)
- Sequential get_value: 68.5ms → 23.1ms (3.0x)
Features
- Event-Driven Architecture — Realistic simulation with proper order lifecycle, slippage, and commission models
- Pipeline API — Factor-based screening with 20+ built-in technical factors (RSI, MACD, Bollinger, Ichimoku, etc.) and easy
CustomFactorextensibility - Factor Composition —
rank(),zscore(),demean(),winsorize(),top(N)withgroupbyfor sector-neutral strategies - PyData Integration — pandas DataFrames in/out, compatible with matplotlib, scipy, statsmodels, scikit-learn
- Multi-Country Support — 42 country domains with proper trading calendars via
exchange_calendars - Minute & Daily Resolution — Full minute-level backtesting with proper market open/close handling
Installation
Zipline supports Python >= 3.10 and is compatible with current versions of NumFOCUS libraries.
Using pip
pip install zipline-refreshFrom source
git clone https://github.com/teleclaws/zipline-refresh.git
cd zipline-refresh
pip install -e .See the documentation for detailed instructions.
Quickstart
Example 1: RSI Long/Short Pipeline Strategy
Use the Pipeline API to rank stocks by RSI and build a long/short portfolio — rebalanced daily:
from zipline.api import attach_pipeline, order_target_percent, pipeline_output, schedule_function
from zipline.finance import commission, slippage
from zipline.pipeline import Pipeline
from zipline.pipeline.factors import RSI
def make_pipeline():
rsi = RSI()
return Pipeline(
columns={"longs": rsi.top(3), "shorts": rsi.bottom(3)},
)
def initialize(context):
attach_pipeline(make_pipeline(), "my_pipeline")
schedule_function(rebalance)
context.set_commission(commission.PerShare(cost=0.001, min_trade_cost=1.0))
context.set_slippage(slippage.VolumeShareSlippage())
def before_trading_start(context, data):
context.pipeline_data = pipeline_output("my_pipeline")
def rebalance(context, data):
pipeline_data = context.pipeline_data
longs = pipeline_data.index[pipeline_data.longs]
shorts = pipeline_data.index[pipeline_data.shorts]
for asset in longs:
order_target_percent(asset, 1.0 / 3.0)
for asset in shorts:
order_target_percent(asset, -1.0 / 3.0)
for asset in context.portfolio.positions:
if asset not in longs and asset not in shorts and data.can_trade(asset):
order_target_percent(asset, 0)Example 2: Multi-Factor Ranking
Combine multiple factors with ranking and normalization:
from zipline.pipeline import Pipeline
from zipline.pipeline.factors import AverageDollarVolume, Returns, RSI
def make_pipeline():
# Factor definitions
momentum = Returns(window_length=20).rank()
mean_reversion = -Returns(window_length=5).rank()
rsi_signal = RSI().rank()
# Composite score (equal-weighted)
composite = (momentum + mean_reversion + rsi_signal).rank()
# Liquidity filter
liquid = AverageDollarVolume(window_length=30).top(100)
return Pipeline(
columns={
"score": composite,
"longs": composite.top(10, mask=liquid),
"shorts": composite.bottom(10, mask=liquid),
},
screen=liquid,
)Data Ingestion
Zipline supports CSV-based data bundles for any market:
# In ~/.zipline/extension.py
from zipline.data.bundles import register
from zipline.data.bundles.csvdir import csvdir_equities
register(
"my-data",
csvdir_equities(["daily"], "/path/to/csv/dir"),
calendar_name="XNYS",
)# Ingest and run
zipline ingest -b my-data
zipline run -f strategy.py --start 2020-1-1 --end 2024-1-1 -o results.pickle --no-benchmark -b my-dataMore examples in the examples directory.
Tech Stack
| Component | Version | Notes |
|---|---|---|
| Python | >= 3.10 | Tested on 3.10 – 3.13 |
| pandas | >= 1.3 | Full NumPy 2.0 support with pandas >= 2.2.2 |
| NumPy | >= 1.23 | NumPy 2.x compatible |
| PyArrow | >= 14.0 | Parquet I/O with zstd compression |
| SQLAlchemy | >= 2.0 | Asset metadata & adjustment storage |
| exchange_calendars | >= 4.2 | 42 global market calendars |
| Cython | >= 0.29 | Performance-critical components |
Contributing
This project is sponsored by Kavout. Built upon the work of Quantopian/zipline and stefan-jansen/zipline-reloaded.
Found a bug or have a suggestion? Open an issue.
License
Apache 2.0. See LICENSE for details.
Academy