V. Regime-Conditioned Analysis

V. Regime-Conditioned Analysis & Post-Training Diagnostics

This section is not part of the training process, but a critical component of the research methodology that addresses the central question:

After training the model on data from all eras, do the learned patterns exhibit genuine "sequence-to-outcome relationships"? And do these patterns persist across eras or remain era-specific?

This section therefore serves to:

Define post-training representation verification methods
Analyze patterns at the levels of event sequences, features, and macro context
Test pattern stability across time periods
Distinguish genuine signals from noise or overfitting

5.1 Regime Partition & Era-Specific Evaluation Framework

Let the set of time periods partitioned by market regime be

$$\mathcal{R} = \{R_1, R_2, \dots, R_L\}$$

where each $R_\ell$ is a non-overlapping time interval

$$R_i \cap R_j = \varnothing, \quad i \ne j$$

covering the data range

$$\bigcup_{\ell=1}^L R_\ell = [T_{start}, T_{end}]$$

After training, we evaluate model behavior separately by era.

Let the sequence-level representation for asset $a$ at time $t$ be

$$u_t^a = F_{\Theta}(\tilde{\mathcal{S}}^a_{(-\infty,t)})$$

Let the set of events within era $R_\ell$ be

$$\mathcal{D}_{R_\ell} = \{(u_t^a, y_t^a) \mid t \in R_\ell\}$$

We compute the performance function

$$\Pi(R_\ell) = \mathbb{E}_{(a,t)\in\mathcal{D}_{R_\ell}} \Big[ \mathcal{M}(y_t^a,\hat{p}_t^a) \Big]$$

where $\mathcal{M}$ may be:

AUROC
Brier score
Log-loss for rare events
Outcome-conditional metrics

The key insight: we are not measuring "market-wide accuracy," but examining whether pattern behavior remains stable or shifts across market eras.

5.2 Cross-Era Stability of Learned Patterns

To test which patterns are "universal across eras," we define a projection of representations into a latent subspace.

Let the projection function be

$$z_t^a = W_r u_t^a$$

We analyze representation distributions across different eras.

For example, measuring distributional distance

$$D(R_i, R_j) = \text{MMD} \Big( \{z_t^a \mid t\in R_i\}, \{z_t^a \mid t\in R_j\} \Big)$$

or KL-divergence via density estimation.

Research interpretation:

If $D$ is low → pattern representations "persist across eras"
If $D$ is high → the pattern is "regime-specific"

The latter implies that patterns appearing significant in certain periods may result from era-specific market structure rather than universal regularities.

5.3 Event-Level Attribution via Attention-Weight Analysis

From Section 3, let the attention weight between events $i$ and $j$ be

$$\alpha_{ij} = \text{softmax}\left( \frac{q_i k_j^\top}{\sqrt{d}} \right)$$

We define the event saliency score

$$s_i = \sum_j \alpha_{ij}$$

And average by era

$$S(x, R_\ell) = \mathbb{E} \big[ s_i \mid x_i = x, t_i \in R_\ell \big]$$

This measures how much event type $x$ contributes to patterns in each era.

This enables investigation of:

Which features are important only during QE regimes
Which are important only during crisis regimes
Which remain consistently important across all eras

Addressing the hypothesis: is pattern accumulation context-dependent?

5.4 Macro-Token Interaction Analysis

To examine which macro context micro-events are "operating under":

We define conditional attention weights

$$\alpha_{ij}^{(macro)} \quad\text{for cases where } x_j\in\mathcal{X}_{macro}$$

And define the context-coupling score

$$C(x, m) = \mathbb{E} \big[ \alpha_{ij}^{(macro)} \mid x_i = x, x_j = m \big]$$

This measures the degree to which "feature event $x$" is coupled with macro event $m$.

Interpretation examples:

Feature-A has high coupling with QE-START
But very low coupling with QT-PHASE
This suggests the same pattern may "work" under QE but be "meaningless" under QT

This constitutes contextual causality-like behavior analysis without making direct causal claims.

5.5 Outcome-Conditional Pattern Profiling

To identify which event sequences appear disproportionately before specific outcomes:

We define neighborhood representations

$$\mathcal{N}_k = \{u_t^a \mid y_t^a = k\}$$

And compute centroids

$$\mu_k = \frac{1}{|\mathcal{N}_k|} \sum_{(a,t)\in\mathcal{N}_k} u_t^a$$

With similarity measures

$$d(u_t^a, \mu_k)$$

If representations preceding "upward" events cluster tightly (low intra-cluster variance) in certain eras, this indicates recurring sequential patterns before those outcomes.

However, if variance is very high → no evidence of stable patterns exists.

5.6 Per-Era Counterfactual-Style Consistency Check

We must distinguish between:

"The model performs well in that era due to overfitting"
vs "The pattern genuinely operates specifically in that era"

We therefore construct cross-era evaluation.

Let representations from era $R_i$ be

$$u^{(i)}_t$$

And test on the distribution of era $R_j$ with frozen model parameters

$$\Pi(R_j \mid R_i)$$

Interpretation: if patterns from era $R_i$ fail in era $R_j$ → this indicates the pattern is a regime-local phenomenon.

Consistent with the hypothesis that some patterns may arise from non-persistent market conditions.

5.7 Noise-Hypothesis Baseline Tests

To filter spurious patterns, we perform the following baselines:

(1) Sequence Shuffle Test

Randomly permute event positions

$$\tilde{\mathcal{S}}^a_{\text{shuffle}}$$

If performance barely decreases → the model is not using "true event ordering" = pattern is not structurally genuine.

(2) Macro-Masking Test

Temporarily remove macro tokens

$$\tilde{\mathcal{S}}^a_{\neg macro}$$

Then measure

$$\Delta\Pi = \Pi_{\text{full}} - \Pi_{\neg macro}$$

If $\Delta\Pi \approx 0$ → macro context contributes nothing meaningful.

(3) Random Feature Injection Test

Inject random synthetic events

$$z_i^{rand}$$

Then check if they are interpreted as patterns.

If so → the model is over-sensitive to noise.

5.8 Interpretability for Engineering Integration

Research and engineering teams can extract results as:

Per-era attention heatmaps
Feature-saliency timelines
Macro-interaction graphs
Regime-cluster trajectories

These outputs support:

Downstream feature engineering
Formulating new hypotheses for subsequent phases
Sanity checking before practical application

5.9 Research Interpretation Protocol

Results are considered to have research value if:

Patterns show statistically significant correlation with outcomes
Patterns either partially persist across eras or exhibit era-specific relationships consistent with macro logic
Results do not arise solely from noise baselines
Patterns depend on joint event sequences, not single features

In other words: this work does not attempt to prove "price predictability," but rather that the event structure of markets exhibits non-random statistical relationships.

5.10 Connection to Next Section

This section summarized methods for extracting meaning from trained models. The next section covers: Experimental Design, Dataset Construction & Reproducibility Framework.

Which will describe:

Systematic data preparation
Verifiable experiment design
Reproducibility standards for this work