Event-Driven Representation Learning in Sparse Financial Time Series

A Macro-Contextual Conceptual Framework and Methodology

Niran Pravithana

V. Regime-Conditioned Analysis & Post-Training Diagnostics

This section is not part of the training process, but a critical component of the research methodology that addresses the central question:

After training the model on data from all eras, do the learned patterns exhibit genuine "sequence-to-outcome relationships"? And do these patterns persist across eras or remain era-specific?

This section therefore serves to:

  • Define post-training representation verification methods
  • Analyze patterns at the levels of event sequences, features, and macro context
  • Test pattern stability across time periods
  • Distinguish genuine signals from noise or overfitting

5.1 Regime Partition & Era-Specific Evaluation Framework

Let the set of time periods partitioned by market regime be

$$\mathcal{R} = \{R_1, R_2, \dots, R_L\}$$

where each $R_\ell$ is a non-overlapping time interval

$$R_i \cap R_j = \varnothing, \quad i \ne j$$

covering the data range

$$\bigcup_{\ell=1}^L R_\ell = [T_{start}, T_{end}]$$

After training, we evaluate model behavior separately by era.

Let the sequence-level representation for asset $a$ at time $t$ be

$$u_t^a = F_{\Theta}(\tilde{\mathcal{S}}^a_{(-\infty,t)})$$

Let the set of events within era $R_\ell$ be

$$\mathcal{D}_{R_\ell} = \{(u_t^a, y_t^a) \mid t \in R_\ell\}$$

We compute the performance function

$$\Pi(R_\ell) = \mathbb{E}_{(a,t)\in\mathcal{D}_{R_\ell}} \Big[ \mathcal{M}(y_t^a,\hat{p}_t^a) \Big]$$

where $\mathcal{M}$ may be:

  • AUROC
  • Brier score
  • Log-loss for rare events
  • Outcome-conditional metrics

The key insight: we are not measuring "market-wide accuracy," but examining whether pattern behavior remains stable or shifts across market eras.

5.2 Cross-Era Stability of Learned Patterns

To test which patterns are "universal across eras," we define a projection of representations into a latent subspace.

Let the projection function be

$$z_t^a = W_r u_t^a$$

We analyze representation distributions across different eras.

For example, measuring distributional distance

$$D(R_i, R_j) = \text{MMD} \Big( \{z_t^a \mid t\in R_i\}, \{z_t^a \mid t\in R_j\} \Big)$$

or KL-divergence via density estimation.

Research interpretation:

  • If $D$ is low → pattern representations "persist across eras"
  • If $D$ is high → the pattern is "regime-specific"

The latter implies that patterns appearing significant in certain periods may result from era-specific market structure rather than universal regularities.

5.3 Event-Level Attribution via Attention-Weight Analysis

From Section 3, let the attention weight between events $i$ and $j$ be

$$\alpha_{ij} = \text{softmax}\left( \frac{q_i k_j^\top}{\sqrt{d}} \right)$$

We define the event saliency score

$$s_i = \sum_j \alpha_{ij}$$

And average by era

$$S(x, R_\ell) = \mathbb{E} \big[ s_i \mid x_i = x, t_i \in R_\ell \big]$$

This measures how much event type $x$ contributes to patterns in each era.

This enables investigation of:

  • Which features are important only during QE regimes
  • Which are important only during crisis regimes
  • Which remain consistently important across all eras

Addressing the hypothesis: is pattern accumulation context-dependent?

5.4 Macro-Token Interaction Analysis

To examine which macro context micro-events are "operating under":

We define conditional attention weights

$$\alpha_{ij}^{(macro)} \quad\text{for cases where } x_j\in\mathcal{X}_{macro}$$

And define the context-coupling score

$$C(x, m) = \mathbb{E} \big[ \alpha_{ij}^{(macro)} \mid x_i = x, x_j = m \big]$$

This measures the degree to which "feature event $x$" is coupled with macro event $m$.

Interpretation examples:

  • Feature-A has high coupling with QE-START
  • But very low coupling with QT-PHASE
  • This suggests the same pattern may "work" under QE but be "meaningless" under QT

This constitutes contextual causality-like behavior analysis without making direct causal claims.

5.5 Outcome-Conditional Pattern Profiling

To identify which event sequences appear disproportionately before specific outcomes:

We define neighborhood representations

$$\mathcal{N}_k = \{u_t^a \mid y_t^a = k\}$$

And compute centroids

$$\mu_k = \frac{1}{|\mathcal{N}_k|} \sum_{(a,t)\in\mathcal{N}_k} u_t^a$$

With similarity measures

$$d(u_t^a, \mu_k)$$

If representations preceding "upward" events cluster tightly (low intra-cluster variance) in certain eras, this indicates recurring sequential patterns before those outcomes.

However, if variance is very high → no evidence of stable patterns exists.

5.6 Per-Era Counterfactual-Style Consistency Check

We must distinguish between:

  • "The model performs well in that era due to overfitting"
  • vs "The pattern genuinely operates specifically in that era"

We therefore construct cross-era evaluation.

Let representations from era $R_i$ be

$$u^{(i)}_t$$

And test on the distribution of era $R_j$ with frozen model parameters

$$\Pi(R_j \mid R_i)$$

Interpretation: if patterns from era $R_i$ fail in era $R_j$ → this indicates the pattern is a regime-local phenomenon.

Consistent with the hypothesis that some patterns may arise from non-persistent market conditions.

5.7 Noise-Hypothesis Baseline Tests

To filter spurious patterns, we perform the following baselines:

(1) Sequence Shuffle Test

Randomly permute event positions

$$\tilde{\mathcal{S}}^a_{\text{shuffle}}$$

If performance barely decreases → the model is not using "true event ordering" = pattern is not structurally genuine.

(2) Macro-Masking Test

Temporarily remove macro tokens

$$\tilde{\mathcal{S}}^a_{\neg macro}$$

Then measure

$$\Delta\Pi = \Pi_{\text{full}} - \Pi_{\neg macro}$$

If $\Delta\Pi \approx 0$ → macro context contributes nothing meaningful.

(3) Random Feature Injection Test

Inject random synthetic events

$$z_i^{rand}$$

Then check if they are interpreted as patterns.

If so → the model is over-sensitive to noise.

5.8 Interpretability for Engineering Integration

Research and engineering teams can extract results as:

  • Per-era attention heatmaps
  • Feature-saliency timelines
  • Macro-interaction graphs
  • Regime-cluster trajectories

These outputs support:

  • Downstream feature engineering
  • Formulating new hypotheses for subsequent phases
  • Sanity checking before practical application

5.9 Research Interpretation Protocol

Results are considered to have research value if:

  1. Patterns show statistically significant correlation with outcomes
  2. Patterns either partially persist across eras or exhibit era-specific relationships consistent with macro logic
  3. Results do not arise solely from noise baselines
  4. Patterns depend on joint event sequences, not single features

In other words: this work does not attempt to prove "price predictability," but rather that the event structure of markets exhibits non-random statistical relationships.

5.10 Connection to Next Section

This section summarized methods for extracting meaning from trained models. The next section covers: Experimental Design, Dataset Construction & Reproducibility Framework.

Which will describe:

  • Systematic data preparation
  • Verifiable experiment design
  • Reproducibility standards for this work