V. Regime-Conditioned Analysis & Post-Training Diagnostics
This section is not part of the training process, but a critical component of the research methodology that addresses the central question:
After training the model on data from all eras, do the learned patterns exhibit genuine "sequence-to-outcome relationships"? And do these patterns persist across eras or remain era-specific?
This section therefore serves to:
- Define post-training representation verification methods
- Analyze patterns at the levels of event sequences, features, and macro context
- Test pattern stability across time periods
- Distinguish genuine signals from noise or overfitting
5.1 Regime Partition & Era-Specific Evaluation Framework
Let the set of time periods partitioned by market regime be
where each $R_\ell$ is a non-overlapping time interval
covering the data range
After training, we evaluate model behavior separately by era.
Let the sequence-level representation for asset $a$ at time $t$ be
Let the set of events within era $R_\ell$ be
We compute the performance function
where $\mathcal{M}$ may be:
- AUROC
- Brier score
- Log-loss for rare events
- Outcome-conditional metrics
The key insight: we are not measuring "market-wide accuracy," but examining whether pattern behavior remains stable or shifts across market eras.
5.2 Cross-Era Stability of Learned Patterns
To test which patterns are "universal across eras," we define a projection of representations into a latent subspace.
Let the projection function be
We analyze representation distributions across different eras.
For example, measuring distributional distance
or KL-divergence via density estimation.
Research interpretation:
- If $D$ is low → pattern representations "persist across eras"
- If $D$ is high → the pattern is "regime-specific"
The latter implies that patterns appearing significant in certain periods may result from era-specific market structure rather than universal regularities.
5.3 Event-Level Attribution via Attention-Weight Analysis
From Section 3, let the attention weight between events $i$ and $j$ be
We define the event saliency score
And average by era
This measures how much event type $x$ contributes to patterns in each era.
This enables investigation of:
- Which features are important only during QE regimes
- Which are important only during crisis regimes
- Which remain consistently important across all eras
Addressing the hypothesis: is pattern accumulation context-dependent?
5.4 Macro-Token Interaction Analysis
To examine which macro context micro-events are "operating under":
We define conditional attention weights
And define the context-coupling score
This measures the degree to which "feature event $x$" is coupled with macro event $m$.
Interpretation examples:
- Feature-A has high coupling with QE-START
- But very low coupling with QT-PHASE
- This suggests the same pattern may "work" under QE but be "meaningless" under QT
This constitutes contextual causality-like behavior analysis without making direct causal claims.
5.5 Outcome-Conditional Pattern Profiling
To identify which event sequences appear disproportionately before specific outcomes:
We define neighborhood representations
And compute centroids
With similarity measures
If representations preceding "upward" events cluster tightly (low intra-cluster variance) in certain eras, this indicates recurring sequential patterns before those outcomes.
However, if variance is very high → no evidence of stable patterns exists.
5.6 Per-Era Counterfactual-Style Consistency Check
We must distinguish between:
- "The model performs well in that era due to overfitting"
- vs "The pattern genuinely operates specifically in that era"
We therefore construct cross-era evaluation.
Let representations from era $R_i$ be
And test on the distribution of era $R_j$ with frozen model parameters
Interpretation: if patterns from era $R_i$ fail in era $R_j$ → this indicates the pattern is a regime-local phenomenon.
Consistent with the hypothesis that some patterns may arise from non-persistent market conditions.
5.7 Noise-Hypothesis Baseline Tests
To filter spurious patterns, we perform the following baselines:
(1) Sequence Shuffle Test
Randomly permute event positions
If performance barely decreases → the model is not using "true event ordering" = pattern is not structurally genuine.
(2) Macro-Masking Test
Temporarily remove macro tokens
Then measure
If $\Delta\Pi \approx 0$ → macro context contributes nothing meaningful.
(3) Random Feature Injection Test
Inject random synthetic events
Then check if they are interpreted as patterns.
If so → the model is over-sensitive to noise.
5.8 Interpretability for Engineering Integration
Research and engineering teams can extract results as:
- Per-era attention heatmaps
- Feature-saliency timelines
- Macro-interaction graphs
- Regime-cluster trajectories
These outputs support:
- Downstream feature engineering
- Formulating new hypotheses for subsequent phases
- Sanity checking before practical application
5.9 Research Interpretation Protocol
Results are considered to have research value if:
- Patterns show statistically significant correlation with outcomes
- Patterns either partially persist across eras or exhibit era-specific relationships consistent with macro logic
- Results do not arise solely from noise baselines
- Patterns depend on joint event sequences, not single features
In other words: this work does not attempt to prove "price predictability," but rather that the event structure of markets exhibits non-random statistical relationships.
5.10 Connection to Next Section
This section summarized methods for extracting meaning from trained models. The next section covers: Experimental Design, Dataset Construction & Reproducibility Framework.
Which will describe:
- Systematic data preparation
- Verifiable experiment design
- Reproducibility standards for this work