Tutorial · Backtesting · Python

How to Backtest a Trading Strategy in Python

Published May 20, 2026 · 15 min read · By WatchDog Bot Team

A backtest tells you how a strategy would have performed on historical data. Done right, it's a cheap way to kill bad ideas before they cost real money. Done wrong, it's a confidence booster for strategies that will lose every penny. This guide covers the full pipeline — data, vectorized vs event-driven, the metrics that actually matter, and the traps that fool 90% of beginners.

What we'll cover

  1. Why most beginner backtests lie
  2. Step 1: Get clean historical data
  3. Step 2: Write a vectorized backtest
  4. Step 3: Why event-driven is more honest
  5. Step 4: Model slippage, fees, and latency
  6. Step 5: The metrics that matter (and the ones that don't)
  7. Step 6: Walk-forward validation
  8. Recommended tools & frameworks

01Why most beginner backtests lie

Before any code, the most important truth: a backtest that shows a 150% annual return is almost certainly broken. Real institutional strategies target 10–25% annual Sharpe-adjusted returns. If your script claims more, the most likely explanations are:

Rule of thumb: When your backtest result looks amazing, your first question should be "what's broken?" — not "how soon can I deploy?"

02Step 1: Get clean historical data

Data quality is the foundation. Bad data = backtest is fiction. For most retail strategies, you want:

Free sources that are good enough to start:

A 2-minute crypto data pull with ccxt:

import ccxt, pandas as pd

exchange = ccxt.binance()
ohlcv = exchange.fetch_ohlcv("BTC/USDT", timeframe="1h", limit=1000)
df = pd.DataFrame(ohlcv, columns=["ts", "open", "high", "low", "close", "volume"])
df["ts"] = pd.to_datetime(df["ts"], unit="ms")
df = df.set_index("ts")
print(df.tail())

03Step 2: Write a vectorized backtest

"Vectorized" means: express your entire strategy as pandas/numpy operations on a whole price series at once. Fast (seconds for years of data), great for prototyping. Here's a moving-average crossover on the data above:

import numpy as np

# Generate signals
df["short_ma"] = df["close"].rolling(5).mean()
df["long_ma"]  = df["close"].rolling(20).mean()
df["signal"]   = (df["short_ma"] > df["long_ma"]).astype(int)

# Position: hold the signal from the PREVIOUS bar
# (shift(1) avoids lookahead bias — you don't know today's close at today's open)
df["position"] = df["signal"].shift(1).fillna(0)

# Returns
df["returns"]          = df["close"].pct_change()
df["strategy_returns"] = df["position"] * df["returns"]

# Equity curve
df["equity"] = (1 + df["strategy_returns"]).cumprod()
print(df[["close", "position", "strategy_returns", "equity"]].tail())

total_return = df["equity"].iloc[-1] - 1
print(f"Total return: {total_return:.2%}")

That's a complete vectorized backtest in ~10 lines. Fast and useful for first-pass sanity checks. But it has hidden assumptions worth calling out.

Hidden assumption: Vectorized backtests implicitly assume you can trade at every bar's close at exactly the close price, with no slippage. That's never true in practice. Vectorized is good for "does this signal even have edge?" — not for "should I deploy with $100k."

04Step 3: Why event-driven is more honest

An event-driven backtest simulates time passing. It walks through bars one at a time, only sees information available up to that moment, decides whether to send orders, and matches those orders against the bars that follow. Closer to live trading because the data flow matches what happens in production.

A minimal event-driven structure:

class Backtest:
    def __init__(self, df, initial_cash=10000):
        self.df = df
        self.cash = initial_cash
        self.position = 0  # in units of the asset
        self.history = []

    def on_bar(self, ts, bar, strategy):
        # 1. strategy decides
        decision = strategy.decide(ts, bar, self.position, self.cash)

        # 2. simulate fill (next bar's open, with cost model)
        if decision == "buy" and self.position == 0:
            fill_price = bar["next_open"] * 1.0005  # 5bps slippage
            self.position = self.cash / fill_price
            self.cash = 0
        elif decision == "sell" and self.position > 0:
            fill_price = bar["next_open"] * 0.9995
            self.cash = self.position * fill_price
            self.position = 0

        # 3. mark-to-market
        equity = self.cash + self.position * bar["close"]
        self.history.append({"ts": ts, "equity": equity})

    def run(self, strategy):
        # Pre-compute next-bar open for fill simulation
        self.df["next_open"] = self.df["open"].shift(-1)
        for ts, bar in self.df.iterrows():
            if pd.isna(bar["next_open"]):
                break
            self.on_bar(ts, bar, strategy)

This is more code but more truthful. Fills happen at the next bar's open (you can't fill on a candle you haven't seen yet), slippage is modeled, the equity curve reflects what an actual broker statement would show.

05Step 4: Model slippage, fees, and latency

Three cost components your backtest probably ignores:

CostTypical valueHow to model
Exchange fees0.05–0.10% per side (crypto), free–0.005% (US equities)Subtract from every fill
Slippage1–10 bps per side, more on illiquid marketsAdjust fill price; or use a volume-impact model
Latency50–500 ms retail, 1–10 µs HFTFor low-freq strategies, ignore. For high-freq, fill at later bar.

For most retail strategies, a flat 10 basis points round-trip cost (fees + slippage on both sides) is a decent first approximation. If your strategy doesn't survive that, it doesn't survive live.

06Step 5: The metrics that matter

"Total return" is the least useful metric. Anyone can pick a leveraged crypto bull market and claim 800% returns. What separates real strategies from gambling is risk-adjusted performance.

The core five to always compute:

import numpy as np

def metrics(equity_curve, risk_free_rate=0.04):
    returns = equity_curve.pct_change().dropna()
    n_years = len(returns) / 252  # adjust if intraday: 252 * bars_per_day

    # Annualized return
    cagr = (equity_curve.iloc[-1] / equity_curve.iloc[0]) ** (1/n_years) - 1

    # Annualized volatility
    ann_vol = returns.std() * np.sqrt(252)

    # Sharpe ratio (per year)
    sharpe = (cagr - risk_free_rate) / ann_vol

    # Sortino ratio (only downside vol)
    downside = returns[returns < 0].std() * np.sqrt(252)
    sortino = (cagr - risk_free_rate) / downside

    # Max drawdown
    cummax = equity_curve.cummax()
    drawdown = (equity_curve / cummax - 1)
    max_dd = drawdown.min()

    return {
        "CAGR":      f"{cagr:.1%}",
        "Vol":       f"{ann_vol:.1%}",
        "Sharpe":    f"{sharpe:.2f}",
        "Sortino":   f"{sortino:.2f}",
        "Max DD":    f"{max_dd:.1%}",
    }

Targets to aim for:

What does NOT matter (or matters less than people think):

07Step 6: Walk-forward validation

The most common backtesting mistake: optimizing parameters on the same dataset you report results on. This guarantees overfitting.

The fix is walk-forward analysis:

  1. Split your data into N rolling windows (say, 12 windows of 1 year each)
  2. For each window: optimize parameters on the first 9 months, then test on the remaining 3 months. The test result is what counts — the optimization is throwaway.
  3. Concatenate the 3-month out-of-sample test segments. That's your honest performance.
def walk_forward(df, train_months=9, test_months=3):
    results = []
    start = df.index[0]
    end = df.index[-1]
    current = start

    while current + pd.DateOffset(months=train_months + test_months) <= end:
        train = df[current : current + pd.DateOffset(months=train_months)]
        test  = df[current + pd.DateOffset(months=train_months) :
                   current + pd.DateOffset(months=train_months + test_months)]

        best_params = optimize_on(train)        # your strategy's hyperparam search
        test_equity = run_backtest(test, best_params)
        results.append(test_equity)

        current += pd.DateOffset(months=test_months)

    return pd.concat(results)

If your strategy looks great on full-dataset optimization but collapses on walk-forward, you don't have a strategy — you have a hindsight model.

Recommended tools & frameworks

ToolWhen to use it
vectorbtVectorized backtests, fast parameter sweeps, gorgeous plots
backtraderEvent-driven, broker simulation, multi-asset, more code
FreqtradeCrypto-only, integrated hyperopt, best-in-class for parameter optimization
QuantConnect LeanCloud-based, multi-asset, used by institutional clients
Pure pandasAnything you want to fully understand from scratch

For early prototyping, pure pandas (like the examples above) is hard to beat — you see exactly what's happening at every line, no framework magic.

A good backtest doesn't prove your strategy will work. It proves your strategy isn't obviously broken.

Ship your strategy. We'll handle the running.

WatchDog Bot is the trading bot platform for Python developers. Free trial, no credit card.

Start Free Trial →

Related reading