Tutorial · Backtesting · Python

How to Backtest a Trading Strategy in Python

Published May 20, 2026 · 15 min read · By WatchDog Bot Team

A backtest tells you how a strategy would have performed on historical data. Done right, it's a cheap way to kill bad ideas before they cost real money. Done wrong, it's a confidence booster for strategies that will lose every penny. This guide covers the full pipeline — data, vectorized vs event-driven, the metrics that actually matter, and the traps that fool 90% of beginners.

What we'll cover

Why most beginner backtests lie
Step 1: Get clean historical data
Step 2: Write a vectorized backtest
Step 3: Why event-driven is more honest
Step 4: Model slippage, fees, and latency
Step 5: The metrics that matter (and the ones that don't)
Step 6: Walk-forward validation
Recommended tools & frameworks

01Why most beginner backtests lie

Before any code, the most important truth: a backtest that shows a 150% annual return is almost certainly broken. Real institutional strategies target 10–25% annual Sharpe-adjusted returns. If your script claims more, the most likely explanations are:

Lookahead bias — you accidentally used future information (e.g., today's close to decide today's open trade)
Survivorship bias — your data only includes assets that exist today, ignoring the ones that delisted/went to zero
Overfitting — you tuned parameters until the curve looked perfect on this exact dataset (and only this one)
No cost model — fills happen at the mid price, fees are zero, slippage is zero — none of which is real
Implementation bug — off-by-one indexing, wrong sign on returns, etc.

Rule of thumb: When your backtest result looks amazing, your first question should be "what's broken?" — not "how soon can I deploy?"

02Step 1: Get clean historical data

Data quality is the foundation. Bad data = backtest is fiction. For most retail strategies, you want:

OHLCV bars (Open, High, Low, Close, Volume) at the resolution your strategy needs
Adjusted for splits and dividends if trading equities
Same exchange / timezone as where you'll trade live

Free sources that are good enough to start:

Crypto: ccxt's fetch_ohlcv against any major exchange — pulls millions of historical bars in seconds
US equities: yfinance for daily data; Polygon.io for intraday (paid)
Kalshi / prediction markets: their public API exposes historical orderbook snapshots

A 2-minute crypto data pull with ccxt:

import ccxt, pandas as pd

exchange = ccxt.binance()
ohlcv = exchange.fetch_ohlcv("BTC/USDT", timeframe="1h", limit=1000)
df = pd.DataFrame(ohlcv, columns=["ts", "open", "high", "low", "close", "volume"])
df["ts"] = pd.to_datetime(df["ts"], unit="ms")
df = df.set_index("ts")
print(df.tail())

03Step 2: Write a vectorized backtest

"Vectorized" means: express your entire strategy as pandas/numpy operations on a whole price series at once. Fast (seconds for years of data), great for prototyping. Here's a moving-average crossover on the data above:

import numpy as np

# Generate signals
df["short_ma"] = df["close"].rolling(5).mean()
df["long_ma"]  = df["close"].rolling(20).mean()
df["signal"]   = (df["short_ma"] > df["long_ma"]).astype(int)

# Position: hold the signal from the PREVIOUS bar
# (shift(1) avoids lookahead bias — you don't know today's close at today's open)
df["position"] = df["signal"].shift(1).fillna(0)

# Returns
df["returns"]          = df["close"].pct_change()
df["strategy_returns"] = df["position"] * df["returns"]

# Equity curve
df["equity"] = (1 + df["strategy_returns"]).cumprod()
print(df[["close", "position", "strategy_returns", "equity"]].tail())

total_return = df["equity"].iloc[-1] - 1
print(f"Total return: {total_return:.2%}")

That's a complete vectorized backtest in ~10 lines. Fast and useful for first-pass sanity checks. But it has hidden assumptions worth calling out.

Hidden assumption: Vectorized backtests implicitly assume you can trade at every bar's close at exactly the close price, with no slippage. That's never true in practice. Vectorized is good for "does this signal even have edge?" — not for "should I deploy with $100k."

04Step 3: Why event-driven is more honest

An event-driven backtest simulates time passing. It walks through bars one at a time, only sees information available up to that moment, decides whether to send orders, and matches those orders against the bars that follow. Closer to live trading because the data flow matches what happens in production.

A minimal event-driven structure:

class Backtest:
    def __init__(self, df, initial_cash=10000):
        self.df = df
        self.cash = initial_cash
        self.position = 0  # in units of the asset
        self.history = []

    def on_bar(self, ts, bar, strategy):
        # 1. strategy decides
        decision = strategy.decide(ts, bar, self.position, self.cash)

        # 2. simulate fill (next bar's open, with cost model)
        if decision == "buy" and self.position == 0:
            fill_price = bar["next_open"] * 1.0005  # 5bps slippage
            self.position = self.cash / fill_price
            self.cash = 0
        elif decision == "sell" and self.position > 0:
            fill_price = bar["next_open"] * 0.9995
            self.cash = self.position * fill_price
            self.position = 0

        # 3. mark-to-market
        equity = self.cash + self.position * bar["close"]
        self.history.append({"ts": ts, "equity": equity})

    def run(self, strategy):
        # Pre-compute next-bar open for fill simulation
        self.df["next_open"] = self.df["open"].shift(-1)
        for ts, bar in self.df.iterrows():
            if pd.isna(bar["next_open"]):
                break
            self.on_bar(ts, bar, strategy)

This is more code but more truthful. Fills happen at the next bar's open (you can't fill on a candle you haven't seen yet), slippage is modeled, the equity curve reflects what an actual broker statement would show.

05Step 4: Model slippage, fees, and latency

Three cost components your backtest probably ignores:

Cost	Typical value	How to model
Exchange fees	0.05–0.10% per side (crypto), free–0.005% (US equities)	Subtract from every fill
Slippage	1–10 bps per side, more on illiquid markets	Adjust fill price; or use a volume-impact model
Latency	50–500 ms retail, 1–10 µs HFT	For low-freq strategies, ignore. For high-freq, fill at later bar.

For most retail strategies, a flat 10 basis points round-trip cost (fees + slippage on both sides) is a decent first approximation. If your strategy doesn't survive that, it doesn't survive live.

06Step 5: The metrics that matter

"Total return" is the least useful metric. Anyone can pick a leveraged crypto bull market and claim 800% returns. What separates real strategies from gambling is risk-adjusted performance.

The core five to always compute:

import numpy as np

def metrics(equity_curve, risk_free_rate=0.04):
    returns = equity_curve.pct_change().dropna()
    n_years = len(returns) / 252  # adjust if intraday: 252 * bars_per_day

    # Annualized return
    cagr = (equity_curve.iloc[-1] / equity_curve.iloc[0]) ** (1/n_years) - 1

    # Annualized volatility
    ann_vol = returns.std() * np.sqrt(252)

    # Sharpe ratio (per year)
    sharpe = (cagr - risk_free_rate) / ann_vol

    # Sortino ratio (only downside vol)
    downside = returns[returns < 0].std() * np.sqrt(252)
    sortino = (cagr - risk_free_rate) / downside

    # Max drawdown
    cummax = equity_curve.cummax()
    drawdown = (equity_curve / cummax - 1)
    max_dd = drawdown.min()

    return {
        "CAGR":      f"{cagr:.1%}",
        "Vol":       f"{ann_vol:.1%}",
        "Sharpe":    f"{sharpe:.2f}",
        "Sortino":   f"{sortino:.2f}",
        "Max DD":    f"{max_dd:.1%}",
    }

Targets to aim for:

Sharpe ≥ 1.0 — anything below is barely worth the effort vs index investing
Sharpe ≥ 2.0 — institutional-quality, very rare for retail
Max drawdown ≤ 30% — losses bigger than this are emotionally impossible to ride out
Sortino > Sharpe — strategy is asymmetric (bigger upside than downside), which is good

What does NOT matter (or matters less than people think):

Win rate — a 35% win rate strategy with 3:1 average win/loss beats a 70% win rate with 0.5:1 every time
Number of trades — more trades just means more fees
"Looks like a smooth equity curve" — usually the result of overfitting

07Step 6: Walk-forward validation

The most common backtesting mistake: optimizing parameters on the same dataset you report results on. This guarantees overfitting.

The fix is walk-forward analysis:

Split your data into N rolling windows (say, 12 windows of 1 year each)
For each window: optimize parameters on the first 9 months, then test on the remaining 3 months. The test result is what counts — the optimization is throwaway.
Concatenate the 3-month out-of-sample test segments. That's your honest performance.

def walk_forward(df, train_months=9, test_months=3):
    results = []
    start = df.index[0]
    end = df.index[-1]
    current = start

    while current + pd.DateOffset(months=train_months + test_months) <= end:
        train = df[current : current + pd.DateOffset(months=train_months)]
        test  = df[current + pd.DateOffset(months=train_months) :
                   current + pd.DateOffset(months=train_months + test_months)]

        best_params = optimize_on(train)        # your strategy's hyperparam search
        test_equity = run_backtest(test, best_params)
        results.append(test_equity)

        current += pd.DateOffset(months=test_months)

    return pd.concat(results)

If your strategy looks great on full-dataset optimization but collapses on walk-forward, you don't have a strategy — you have a hindsight model.

Recommended tools & frameworks

Tool	When to use it
vectorbt	Vectorized backtests, fast parameter sweeps, gorgeous plots
backtrader	Event-driven, broker simulation, multi-asset, more code
Freqtrade	Crypto-only, integrated hyperopt, best-in-class for parameter optimization
QuantConnect Lean	Cloud-based, multi-asset, used by institutional clients
Pure pandas	Anything you want to fully understand from scratch

For early prototyping, pure pandas (like the examples above) is hard to beat — you see exactly what's happening at every line, no framework magic.

A good backtest doesn't prove your strategy will work. It proves your strategy isn't obviously broken.

Ship your strategy. We'll handle the running.

WatchDog Bot is the trading bot platform for Python developers. Free trial, no credit card.

Start Free Trial →

How to Backtest a Trading Strategy in Python

What we'll cover

01Why most beginner backtests lie

02Step 1: Get clean historical data

03Step 2: Write a vectorized backtest

04Step 3: Why event-driven is more honest

05Step 4: Model slippage, fees, and latency

06Step 5: The metrics that matter

07Step 6: Walk-forward validation

Recommended tools & frameworks

Ship your strategy. We'll handle the running.

Related reading