Skip to content
FactorQX
intermediatepythonpandasohlcv

Resampling OHLCV Data with pandas

Aggregate intraday OHLCV bars to higher timeframes with pandas resample, the correct OHLC agg dict, and the timezone, closed, and label gotchas that bite.

3 min read

Educational software/research content only — not investment advice, a trading signal, or a recommendation.

Market data usually arrives at one fixed granularity — 1-minute bars, say — but research and backtests often need other timeframes. Resampling lets you roll fine-grained OHLCV (open, high, low, close, volume) bars up into coarser ones. Done wrong, it silently corrupts your bars; done right, it is a few lines of pandas.

The data shape

Start with a DataFrame indexed by a DatetimeIndex, one row per bar:

data.py
import pandas as pd
 
df = pd.DataFrame(
    {
        "open":   [100, 101, 102, 103, 104, 105],
        "high":   [101, 102, 103, 104, 105, 106],
        "low":    [ 99, 100, 101, 102, 103, 104],
        "close":  [101, 102, 103, 104, 105, 106],
        "volume": [ 10,  12,   8,  15,   9,  11],
    },
    index=pd.date_range("2026-01-02 09:30", periods=6, freq="1min"),
)

Resampling requires a datetime index (or you must pass on="timestamp_column"). If your timestamps are a regular column, set them first with df.set_index("timestamp").

The OHLC aggregation dict

Each column aggregates differently. The open is the first price in the window, the high is the maximum, the low is the minimum, the close is the last, and volume sums. Encode that as a dict passed to .agg:

resample_ohlcv.py
agg = {
    "open":   "first",
    "high":   "max",
    "low":    "min",
    "close":  "last",
    "volume": "sum",
}
 
bars_5m = df.resample("5min").agg(agg)
print(bars_5m)

Never aggregate all columns the same way — taking the mean of high, for example, produces a bar that never traded. The dict above is the canonical mapping; reuse it everywhere.

Dropping empty buckets

resample produces a row for every interval in the range, even ones with no underlying data. For 24/7 instruments that is fine, but for assets that trade on a session, off-hours buckets appear as all-NaN rows. Drop them based on a column that is NaN only when the bucket is empty:

drop_empty.py
bars_5m = df.resample("5min").agg(agg).dropna(subset=["open"])

The closed and label gotchas

By default, for most frequencies pandas uses closed="left" and label="left". That means each bucket includes its left edge and excludes its right edge, and the bucket is labeled by its left edge. Concretely, a 5-minute bar stamped 09:30 covers [09:30, 09:35).

Conventions differ across data vendors. Some stamp a bar by the time it ends. If you mix sources without aligning this, your bars will appear shifted by one interval and any join across sources will be subtly wrong. Set both explicitly:

closed_label.py
bars = df.resample("5min", closed="left", label="left").agg(agg)
bars_right = df.resample("5min", closed="left", label="right").agg(agg)

A label that points to the end of the interval is also a classic source of lookahead bias: a bar stamped 09:35 already contains information up to 09:35, so using it as if it were known at 09:30 leaks the future. See Avoiding Lookahead Bias in Backtests for how this propagates into signals.

Timezones

Equities resampled to daily bars must be aligned to the exchange's local session, not UTC. If your index is UTC, a naive daily resample splits a single trading day across two calendar dates. Convert to the exchange timezone first:

timezone.py
df_local = df.tz_convert("America/New_York")
daily = df_local.resample("1D").agg(agg).dropna(subset=["open"])

If your index is timezone-naive, tz_localize("UTC") before converting. Be careful around daylight-saving transitions: localizing naive timestamps that fall in a DST gap raises, and you may need tz_localize(..., nonexistent="shift_forward", ambiguous="NaT") to handle them deliberately.

Anchored offsets for sessions

For week- or month-level bars, anchored frequencies control where the period begins. "W-FRI" ends weeks on Friday (useful for equity weeks); "MS" anchors to month start. Pick the anchor that matches how you reason about the period:

weekly.py
weekly = df.resample("W-FRI").agg(agg).dropna(subset=["open"])

Where to go next

With clean, correctly-timestamped bars you can compute returns on any timeframe — revisit Computing Returns and Equity Curves with pandas — and then evaluate a strategy with Performance Metrics Every Backtest Should Report. Getting the resampling conventions right here is what keeps those downstream numbers trustworthy.

Educational content. This article covers software development and research methods only. It is not investment advice, a trading signal, or a recommendation. See our disclaimer.