MARFIN Research · Model Validation · Options Market Evidence

How the Options Market Validates the MARFIN Regime Framework

A walk-forward research study showing that MARFIN market states are not just internal labels. They carry information that appears in an independent market layer: implied volatility, volatility skew, and spread-shape mean reversion.

#1 ranked model across the key validation scorecard metrics
3.01 MARFIN MAE IV All, versus 4.99 for DTE-only
1.07 MARFIN MAE Skew Core Non-ATM, best among tested models
95.36% entry/exit proxy success rate for MARFIN spread-shape signals
What this article covers
  1. Why we used the options market as an independent validation layer.
  2. Why walk-forward testing matters for model credibility.
  3. How MARFIN was compared against DTE-only, VXN-only, MA200-only, and RandomState baselines.
  4. What the implied-volatility and skew error tests showed.
  5. Why spread-shape mean reversion supports the validity of MARFIN states.
  6. What this does and does not prove.
Core thesis

Why the options market is a useful validation test

Market regime models are often tested directly: returns, drawdowns, Sharpe ratio, crisis behavior, turnover, and portfolio-level outcomes. Those tests are useful, but they can be incomplete.

If a model was built to manage exposure, testing it only through its own allocation logic can partly stay inside the same system. A stronger question is different:

If MARFIN states genuinely describe the market environment, should an independent market layer behave differently across those states?

The options market is a natural place to test that question. Option prices reflect expected volatility, downside protection demand, skew, tail risk, liquidity premium, and the market's aggregate perception of risk.

If MARFIN states were just arbitrary labels, they should not explain the option surface better than simple baselines. But if MARFIN captures real regime information, similar MARFIN states should be associated with similar implied-volatility structure, skew behavior, and spread-shape normalization patterns.

The validation idea: MARFIN should not only use market data to classify regimes. Those regimes should also be visible in the independent pricing behavior of the options market.

Research design

Why walk-forward validation matters

The biggest risk in this type of research is accidentally using future information. If expected values are estimated using the full historical sample and then tested on earlier dates, the result can look strong while being unrealistically informed by the future.

To avoid that, the study used a walk-forward structure.

For each tested date, expected values were estimated only from observations available before that date.

In plain English: the model did not know the future. This makes the validation closer to how the framework would be used in real time, where only past data is available at the decision point.

Baselines

What MARFIN was compared against

A model should not be validated only against zero or a broad average. We compared MARFIN against several alternative explanations that could plausibly explain option-surface behavior.

DTE-only

Groups observations only by days to expiration. This is the simplest baseline for option data.

VXN-only

Uses DTE and rolling VXN quartiles. This is a strong volatility-aware baseline because VXN is directly tied to Nasdaq-100 expected volatility.

MA200-only

Uses DTE and whether QQQ is above or below its 200-day moving average. This represents a common trend-regime filter.

RandomState

Randomly shuffles MARFIN states across dates. This tests whether the real date-to-state mapping contains information.

The RandomState baseline is especially important. It preserves the general state structure, but breaks the link between MARFIN states and real market dates. If MARFIN were not informative, RandomState should perform similarly.

MARFIN state grid

How MARFIN groups market observations

MARFIN groups observations by a richer state structure than expiration alone:

Regime + AllocationState + ScoreBucket + DTE

This lets the model compare option observations not only by expiration, but also by the internal state of the market.

Market regimes

Regime Interpretation
BULL A constructive risk-on environment.
NEUTRAL A mixed or transitional market environment.
BEAR A defensive or risk-off environment.
UNKNOWN A state where the model does not force a clean directional regime label.

Allocation states

Allocation state Interpretation
TICKER Growth / primary risk-asset exposure state.
SPLV Transition / low-volatility equity state.
GLD Hedge / gold-oriented defensive state.
BIL Cash / T-bills state focused on capital preservation.

Score buckets

Bucket Score range Interpretation
score_lt_025 Score < 0.25 Low-score state.
score_025_050 0.25 ≤ Score < 0.50 Lower-middle score state.
score_050_075 0.50 ≤ Score < 0.75 Upper-middle score state.
score_gte_075 Score ≥ 0.75 High-score state.
Test layers

What was tested

The validation was not based on a single metric. It tested multiple layers of the option surface and spread behavior.

  1. Implied-volatility error. How close is the expected IV to the actual market IV?
  2. Skew error. How well does the model explain the shape of the smile relative to ATM?
  3. Spread-shape convergence. Does spread mispricing compress and move toward zero after extreme deviations?
  4. Entry/exit proxy. Does a simple z-score normalization framework identify deviations that revert toward normal?

Important: the spread tests are not a final executable PnL backtest. They evaluate volatility-spread convergence, not live trading profitability after bid/ask, slippage, commissions, margin, assignment risk, or execution constraints.

Result 1

MARFIN explains implied volatility better than the baselines

The first validation layer measures walk-forward implied-volatility error. The key metric is MAE IV, measured in volatility points. Lower is better.

Model MAE IV All MAE IV ATM
MARFIN3.013.26
VXN-only3.694.19
MA200-only3.704.15
RandomState4.965.65
DTE-only4.995.70

MARFIN produced the lowest error across all tested models. On MAE IV All, MARFIN improved the error by roughly 18% versus VXN-only and MA200-only, and by roughly 40% versus DTE-only and RandomState.

The ATM result is even stronger: MARFIN reduced the ATM IV error by more than 42% versus DTE-only and RandomState.

Result 2

MARFIN explains skew and smile shape better

Implied volatility level is only one part of the surface. Skew is more demanding because it measures the shape of the volatility smile: put wing, call wing, and the asymmetry of risk around ATM.

The key metric here is MAE Skew Core Non-ATM. Lower is better.

Model MAE Skew All Non-ATM MAE Skew Core Non-ATM
MARFIN1.521.07
VXN-only1.661.17
MA200-only1.661.18
RandomState1.811.30
DTE-only1.811.31

MARFIN again ranked first. In core non-ATM skew, MARFIN improved the error by roughly 9% versus VXN-only and MA200-only, and by more than 18% versus RandomState and DTE-only.

This matters because skew is not just a volatility level. It reflects downside protection demand, tail-risk pricing, and the market's structural perception of risk. MARFIN performing better on skew means the model is capturing more than a simple volatility filter.

Scorecard

MARFIN ranked first across the key validation metrics

A useful validation result should not depend on one cherry-picked metric. MARFIN ranked first across the main scorecard categories.

Test MARFIN rank Best model Interpretation
MAE IV ATM1MARFINLower is better
MAE Skew Core Non-ATM1MARFINLower is better
20D SpreadShape Convergence1MARFINHigher is better
Entry/Exit Success Rate1MARFINHigher is better

The model is not only strong where it is convenient. It leads across IV accuracy, skew accuracy, spread-shape convergence, and the entry/exit proxy.

Result 3

MARFIN spread-shape deviations revert more reliably

The next layer tests whether spread-shape mispricing built from MARFIN states behaves like a mean-reverting spread.

The concept is simple: if MARFIN defines a better regime-conditioned normal shape, then deviations from that shape should more often compress and move back toward zero.

At the 20-trading-day horizon, the results were:

Model Signals Avg Abs Compression Abs Compressed Moved Toward Zero
MARFIN6901.1191.45%92.90%
MA200-only9071.1890.30%91.07%
RandomState9191.1089.34%91.51%
DTE-only9071.0188.97%91.07%
VXN-only9811.0287.77%88.99%

MARFIN had the highest percentage of compressed deviations and the highest percentage of movement toward zero. This supports the idea that MARFIN-normalized spread-shape mispricing is not just a visual construct; it has historically shown stronger normalization behavior.

Result 4

The entry/exit proxy also favors MARFIN

The entry/exit proxy uses a simple z-score logic: enter when spread-shape mispricing becomes extreme, and exit when it returns closer to normal or reaches a time limit.

Entry: |z-score| >= 2
Exit:  |z-score| <= 0.5 or maximum holding window

The comparison across models was:

Model Trades Success Rate Avg Compression Median Compression Avg Hold
MARFIN69095.36%1.451.088.33
MA200-only90794.16%1.571.169.99
RandomState91991.84%1.370.9711.61
DTE-only90790.96%1.290.8911.72
VXN-only98189.19%1.360.8711.96

MARFIN produced the highest success rate and the shortest average holding period. MA200-only showed slightly higher average compression, but MARFIN had the stronger combination of stability, success rate, and speed of normalization.

Randomness check

RandomState shows that the MARFIN date mapping matters

RandomState is a critical sanity check. It asks what happens if we keep the MARFIN state structure but break its link to real market dates.

If MARFIN did not contain meaningful information, RandomState should perform similarly. It did not.

Implied volatility

RandomState MAE IV All was 4.96 versus 3.01 for MARFIN. RandomState MAE IV ATM was 5.65 versus 3.26 for MARFIN.

Skew

RandomState MAE Skew Core Non-ATM was 1.30 versus 1.07 for MARFIN.

Entry/exit proxy

RandomState success rate was 91.84% versus 95.36% for MARFIN.

Holding period

RandomState average hold was 11.61 days versus 8.33 days for MARFIN.

This does not prove that MARFIN is perfect. It does show that the real MARFIN state-to-date mapping carries information. When that mapping is randomized, quality drops materially.

Robustness

Year-by-year stability

The entry/exit proxy was also reviewed by year. The result was not concentrated in one isolated market period.

Year Trades Success Rate Avg Compression
20168490.48%0.47
20174997.96%1.02
201815598.71%1.63
20193497.06%1.15
20206998.55%1.94
20214100.00%1.74
202219392.75%1.36
202310295.10%2.07

We do not over-interpret years with a small number of trades, especially 2021. Still, the broader pattern matters: the result appeared across different market environments, including 2018, 2020, and 2022, when volatility regimes changed sharply.

Interpretation

Why this is a reverse validation of MARFIN

A standard model test asks: if we use MARFIN to manage exposure, what happens to returns and drawdowns?

This study asks a different question: does an independent market layer recognize the same risk regimes that MARFIN identifies?

The results suggest that it does. MARFIN states help explain implied volatility, ATM IV, skew, smile shape, and the mean reversion of spread-shape deviations better than several simpler alternatives.

The reverse-validation argument: MARFIN does not only classify the market internally. Its states are reflected externally in the way the options market prices risk.

This is not a formal mathematical proof in the strict academic sense. It is an empirical walk-forward validation. But it is a strong one: the model's regime labels appear to carry information in an independent pricing layer.

Research chain

How this connects to MARFIN option-surface research

The validation fits into a broader research chain.

  1. MARFIN option surfaces. Build Fair Surface and Market-Expected Surface by regime, allocation state, score bucket, DTE, and moneyness.
  2. Spread Shape Mispricing. Move from the full surface to the shape of specific vertical spreads.
  3. Reverse model validation. Compare MARFIN with DTE-only, VXN-only, MA200-only, and RandomState to test whether MARFIN states contain independent information.

The conclusion is not only that MARFIN can help analyze options. The deeper point is that the options market helps validate MARFIN itself.

Limits

What we are not claiming

We are not claiming that MARFIN is a perfect model. We are not claiming final academic proof. We are not claiming that every z-score event is directly tradable. We are not claiming that compression in volatility points automatically becomes dollar profit.

The spread-shape tests do not fully model bid/ask execution, slippage, commissions, margin, assignment risk, early exercise, liquidity constraints, or gap risk.

The more precise claim is this:

The walk-forward evidence shows that MARFIN states contain additional market-regime information. That information appears in implied volatility, skew, and spread-shape mean reversion, and it outperforms several simple baseline models.
Conclusion

MARFIN states are visible in the option surface

The core value of MARFIN is not that it attempts to forecast every market tick. Its value is that it organizes market conditions into a structured regime framework.

This reverse validation shows that the structure is not arbitrary. MARFIN states help explain how the options market prices risk: implied volatility, skew, volatility smile shape, and the normalization of spread-shape deviations.

Across the walk-forward validation, MARFIN ranked first in the key scorecard metrics: ATM IV error, core non-ATM skew error, 20D spread-shape convergence, and entry/exit success rate.

The most important conclusion is simple:

MARFIN uses market data to define regimes.
The options market then reflects those regimes in its pricing of risk.

That is why we view this study as an important empirical validation of MARFIN as a market-regime framework.

Informational disclaimer

MARFIN is a market-regime and financial analytics framework. This material about model validation, option surfaces, implied volatility, volatility skew, and spread-shape mispricing is provided for informational and educational purposes only. It does not constitute investment advice, trading advice, portfolio management, brokerage, execution, or a recommendation to buy, sell, hold, hedge, allocate to, or avoid any security or option contract. Historical research and backtested relationships do not guarantee future results. Options involve substantial risk and are not suitable for all investors.