Performance Metrics Every Backtest Should Report
Define and compute CAGR, max drawdown, a Sharpe-style ratio, win rate, profit factor, and expectancy in Python — and always pair return with risk.
Educational software/research content only — not investment advice, a trading signal, or a recommendation.
A single number never describes a strategy. Total return without risk context, a Sharpe ratio without trade statistics, a win rate without payoff size — each tells half the story. This article defines the core metrics, shows correct pandas/numpy code, and is honest about what each assumes. The recurring theme: always report a return metric alongside a risk metric.
CAGR — compound annual growth rate
CAGR normalizes total growth to a per-year rate, making strategies of different lengths comparable.
import numpy as np
import pandas as pd
def cagr(equity: pd.Series, periods_per_year: int = 252) -> float:
total_growth = equity.iloc[-1] / equity.iloc[0]
years = len(equity) / periods_per_year
return total_growth ** (1 / years) - 1periods_per_year is 252 for daily equity bars, 52 for weekly, 12 for monthly. CAGR says nothing about the path taken to get there — a smooth 12% and a violent 12% look identical. That is exactly why it must never travel alone. Check intuition against the CAGR Calculator.
Max drawdown — the headline risk number
Maximum drawdown is the largest peak-to-trough decline in the equity curve. It is the most tangible risk measure because it answers "how bad did it get."
def max_drawdown(equity: pd.Series) -> float:
running_max = equity.cummax()
drawdown = equity / running_max - 1.0
return drawdown.min() # most negative valuecummax tracks the running peak; the drawdown series is the percentage below that peak at each point. Pair CAGR with max drawdown to get a return-per-unit-of-pain read (sometimes called the Calmar ratio when divided). Validate with the Max Drawdown Calculator.
Sharpe-style ratio — and its assumptions
A Sharpe-style ratio divides mean excess return by the standard deviation of returns, annualized.
def sharpe(returns: pd.Series, rf: float = 0.0,
periods_per_year: int = 252) -> float:
excess = returns - rf / periods_per_year
if excess.std(ddof=1) == 0:
return np.nan
return np.sqrt(periods_per_year) * excess.mean() / excess.std(ddof=1)Be honest about what this assumes. The sqrt(periods_per_year) annualization assumes returns are independent and identically distributed — real returns are autocorrelated and fat-tailed, so the number is an approximation. Standard deviation penalizes upside and downside equally, which may not match how you experience risk. And the ratio is highly sensitive to the sampling frequency. Treat Sharpe as one lens, not a verdict.
Win rate
Win rate is the fraction of trades that were profitable. It is computed per trade, not per bar.
def win_rate(trade_pnls: pd.Series) -> float:
return (trade_pnls > 0).mean()A high win rate is meaningless on its own: a strategy can win 90% of the time and still lose money if the 10% of losses are large. Always read it next to payoff size.
Profit factor
Profit factor is gross profit divided by gross loss. Above 1.0 means winners outweigh losers in aggregate.
def profit_factor(trade_pnls: pd.Series) -> float:
gross_profit = trade_pnls[trade_pnls > 0].sum()
gross_loss = -trade_pnls[trade_pnls < 0].sum()
if gross_loss == 0:
return np.inf
return gross_profit / gross_lossExpectancy
Expectancy is the average outcome per trade — the single number that combines win rate with payoff size.
def expectancy(trade_pnls: pd.Series) -> float:
return trade_pnls.mean()
def expectancy_decomposed(trade_pnls: pd.Series) -> float:
wins = trade_pnls[trade_pnls > 0]
losses = trade_pnls[trade_pnls < 0]
p_win = (trade_pnls > 0).mean()
avg_win = wins.mean() if len(wins) else 0.0
avg_loss = losses.mean() if len(losses) else 0.0
return p_win * avg_win + (1 - p_win) * avg_lossThe decomposed form makes the trade-off explicit: a low win rate is fine if avg_win dwarfs avg_loss. Sanity-check with the Expectancy Calculator.
Pair every return with a risk
The single most important habit: never report a return metric without a matching risk metric. CAGR pairs with max drawdown. Mean return pairs with volatility (Sharpe). Win rate pairs with profit factor or expectancy. A metric reported in isolation invites you to fool yourself — and these numbers are only as trustworthy as the backtest underneath them.
That last point matters most. If your equity curve was built with future-leaking signals, every metric above is inflated. Audit the pipeline first; review Avoiding Lookahead Bias in Backtests before drawing any conclusion from a results table.
Where to go next
Make sure the inputs are correct before computing any of this: build the equity curve with Computing Returns and Equity Curves with pandas and aggregate to your reporting timeframe with Resampling OHLCV Data with pandas. Then report these metrics as a panel — never a single hero number.