How to Build a Regime-Filtered Backtest That Actually Works
Most backtests lie to you. Not through fraud or bad data — through aggregation. When you run a strategy over four years of Bitcoin price history and see a Sharpe ratio of 1.4, that number is a weighted average of wildly different performance across different market conditions. A strategy that crushes it during trending bull regimes and gets destroyed during choppy distribution phases will still look acceptable in aggregate. The regime-filtered backtest is the tool that unmasks this problem.
This post walks through why regime-aware filtering changes what you see, how to implement it technically, and what the resulting equity curve actually tells you about your strategy's real edge.
Why Aggregate Backtests Mislead
Bitcoin's price history is not a single environment. It contains extended accumulation phases, explosive markup periods, volatile distribution phases, and grinding bear regimes. Each of these states has different volatility profiles, different autocorrelation structures, and different responses to the same signals.
A momentum strategy, for example, might generate 80% of its returns during trending regimes that represent only 35% of historical time. During the remaining 65% — sideways chop, risk-off compression, liquidation cascades — that same strategy might give back a meaningful portion of those gains. Averaged together, the backtest looks fine. Filtered by regime, you see the truth: the strategy has a narrow window of genuine edge.
This is not a theoretical concern. As of 31 May 2026, Bitcoin is trading at $73,847 with a cautious, risk-off backdrop following recent geopolitical tensions. Sentiment data from late May flagged early liquidation pressure, with exits running ahead of sentiment — a classic leading indicator of regime stress. A strategy backtested without regime filtering would have no framework for recognising that the current environment is structurally different from the trending conditions of earlier months.
For a deeper grounding in what market regimes actually are and how they're classified, this primer on crypto market regimes is worth reading before going further.
Step 1: Define Your Regimes Before You Backtest
The most common mistake in backtesting regime detection is letting the regime labels be determined by the same data you're using to test the strategy. This creates look-ahead bias and circular reasoning. Your regime classification model must be built and locked before the backtest begins.
A practical regime taxonomy for Bitcoin might include:
- Trending Bull: Sustained positive price momentum, low funding rate volatility, expanding open interest
- Distribution: Price near recent highs, increasing volatility, mixed derivatives signals
- Risk-Off / Liquidation: Sharp drawdowns, elevated funding rate swings, rapid open interest contraction
- Accumulation: Low volatility, compressed price range, recovering on-chain flows
- Trending Bear: Sustained downtrend, negative funding rates, declining participation
Crypto feature engineering for regime classification covers the specific input features that tend to produce stable, tradeable regime labels — including derivatives-based signals that lead price.
Step 2: Label Your Historical Data
Once your regime model is built, apply it to your full historical dataset to generate a regime label for each bar. Store this as a separate column alongside price, volume, and any other features.
A few practical notes:
Smoothing matters. Raw regime labels generated bar-by-bar will often flicker between states. A two-day trending signal followed by a one-day accumulation label followed by another trending label is noise, not signal. Apply a minimum holding period or a smoothing filter — for example, only change the regime label if the new state persists for at least three consecutive bars.
Transition zones are a regime. The period between a clear bull trend and a confirmed distribution phase is genuinely ambiguous. Treating it as a distinct state (or simply excluding it from regime-specific analysis) produces cleaner performance attribution than forcing it into one category.
Validate your labels out-of-sample. Before using regime labels in a backtest, check that they correspond to recognisable market conditions. A label called "Risk-Off" should correspond to periods with elevated realised volatility and negative price momentum. If it doesn't, your model is mislabelling.
Step 3: Run the Backtest — Filtered and Unfiltered
Now run your strategy twice: once on all data, and once restricted to each individual regime.
The unfiltered run gives you the aggregate result you'd see in a standard backtest. The filtered runs give you regime-specific performance. The comparison between the two is where the insight lives.
What you're looking for:
Regime concentration. If 90% of your strategy's total return comes from one regime that represents 25% of historical time, you have a highly regime-dependent strategy. That's not automatically bad — but it means your live performance will be tightly coupled to regime identification accuracy.
Regime-specific drawdowns. A strategy with a 15% maximum drawdown in aggregate might have a 40% drawdown during distribution phases. That 40% number is what you'd actually experience if you ran the strategy live through a distribution regime without filtering.
Win rate and expectancy by regime. Some strategies have strong win rates in trending regimes but negative expectancy in accumulation phases. Others are steady across regimes but have higher variance in risk-off conditions. Neither is inherently better — but you need to know which type you have.
For context on how position sizing interacts with regime-specific performance, this guide on regime-based position sizing is directly applicable here.
Step 4: Read the Regime-Filtered Equity Curve
The regime-filtered equity curve is different from the standard equity curve. Instead of a single line from start to finish, you have a separate equity curve for each regime — showing what your account would look like if you only ever traded that regime.
A well-functioning regime-aware strategy should show:
- A positively sloping equity curve in the regimes where the strategy is designed to trade
- A flat or near-flat equity curve in regimes where the strategy is designed to stay out
- No large drawdowns in any single regime curve (if there are, the regime label isn't capturing the risk adequately)
The transition points between regimes are also informative. A strategy that performs well within regimes but takes large hits at regime boundaries has a transition latency problem — the signal that a regime has changed is arriving too late. This is one reason why detecting regime transitions early is a distinct problem from classifying steady-state regimes.
Step 5: Stress-Test the Regime Boundaries
Regime models are not perfect. In live trading, you will sometimes be one or two bars late in identifying a regime change. Your backtest should account for this.
Run a sensitivity analysis: what happens to your strategy's performance if regime labels are shifted forward by one bar? By two bars? By five bars? A robust regime-aware strategy should degrade gracefully under label lag — not collapse.
Also test what happens if regime labels are occasionally wrong. Introduce a random 10% mislabelling rate into your regime column and rerun the backtest. If performance falls off a cliff under mild label noise, the strategy's edge is too dependent on perfect regime identification to be practical in live trading.
What RegimeRisk's Regime Data Adds to This Process
Building a regime classification model from scratch requires significant feature engineering, model validation, and ongoing maintenance. RegimeRisk provides pre-built, continuously updated regime signals for Bitcoin, Ethereum, and Solana — including derivatives-derived features that tend to lead price-based signals by hours to days.
For traders building regime-filtered backtests, having access to a validated, historically consistent regime dataset removes one of the most error-prone steps in the process: the regime labelling itself. The regime classifications can be used directly as filter columns in a backtest, with confidence that the labels were generated using only information available at each historical point in time.
For reference on how regime signals interact with specific derivatives data, the analysis of Bitcoin's derivatives data versus raw regime intelligence covers the practical differences in signal construction.
Common Pitfalls to Avoid
Optimising for regime labels. If you tune your strategy parameters separately for each regime, you're overfitting. Regime filtering should determine when to trade, not how to parameterise the strategy in each regime. Keep strategy parameters fixed across regimes.
Ignoring transaction costs in transitions. Regime changes trigger position exits and entries. In a backtest, make sure you're accounting for the full round-trip cost of regime-triggered trades, not just signal-triggered trades.
Treating regime filters as binary. Instead of a hard on/off switch, consider scaling position size proportionally to regime confidence. A high-confidence trending regime might warrant full position size; a borderline classification might warrant 50%. This produces smoother live performance than hard switches.
Backtesting on too short a history. Regime-filtered backtests require enough occurrences of each regime to produce statistically meaningful results. If your backtest covers only 18 months of data and contains only two accumulation phases, your accumulation-regime performance estimate has wide confidence intervals. Extend the history, or be explicit about the uncertainty.
Key Takeaways
A regime-filtered backtest separates your strategy's performance by market state, revealing whether apparent edge is real or regime-dependent. The most important output is not aggregate Sharpe or drawdown — it's the regime-specific equity curves, which show exactly where and when the strategy actually works.
Building this correctly requires that regime labels be generated without look-ahead bias, validated against recognisable market conditions, and stress-tested for label lag and noise sensitivity. Strategies that collapse under mild label imperfection are too fragile for live deployment.
In the current environment — Bitcoin at $73,847 with risk-off sentiment and early liquidation pressure visible in derivatives data — the difference between a regime-aware strategy and an unfiltered one is not academic. It determines whether you're positioned correctly for the regime that's actually present, or the one your aggregate backtest assumes is always present.
The practical goal is a strategy whose regime-filtered equity curves are individually interpretable, whose performance concentrations are understood and accepted, and whose behaviour at regime transitions has been explicitly tested. That is a backtest you can actually trust.
Share this post
Track Bitcoin's Current Regime
See whether BTC is in a Bull, Bear, Range or Transition regime right now.
View Live Dashboard →