IV. Training Objective, Optimization Strategy & Learning Constraints
This section describes how the model from Section 3 is trained under temporal constraints, event imbalance, and data leakage risks, while establishing clear and verifiable statistical objectives.
Section objectives:
- Define the formal objective function
- Describe pipeline-level leakage prevention
- Address class imbalance and outcome sparsity
- Explain regularization for robustness
- Establish the regime-aware training framework
4.1 Learning Setting — Time-Consistent Supervised Learning
Let the sequence for asset $a$ at evaluation time $t$ be:
And define the outcome event (from Section 1)
The objective is to estimate
Using only information available before time $t$.
This constitutes a causal-valid learning setting, distinct from time-agnostic forecasting.
4.2 Time-Based Dataset Splitting (No Random Shuffle)
To prevent leakage, dataset splitting is performed temporally
In chronological order only:
Random splitting is prohibited because:
- Future events may indirectly appear in training
- Macro-regime patterns may leak across splits
Pipeline enforcement principles
- Sequence builder restricts events to ≤ cutoff time
- Label builder references only future outcomes after cutoff
- Model loader verifies time-consistency
4.3 Training Objective Function
Let the final sequence representation be $u_t^a$
Probability estimation function:
Define the cross-entropy loss
4.4 Class Imbalance & Event-Sparsity Handling
Since outcome events are typically rare, we define a weighted loss
where
- $\omega_k \propto 1/\text{freq}(k)$
- Alternatively, use focal loss to emphasize hard examples
Research rationale: the goal is not artificial class balancing, but preventing the model from "ignoring rare but important events."
4.5 Regularization for Robust Pattern Learning
To avoid overfitting to short-term noise, we add regularization components.
(1) Feature-Gating Sparsity
Encourages the representation to select only necessary features.
(2) Token Dropout / Event Masking
At the sequence level
Semantic effects:
- Forces the model to learn patterns not dependent on any single token
- Increases robustness to missing signals
(3) Temporal Smoothing Penalty
For cases where representations fluctuate abnormally
This encodes the prior that patterns should accumulate gradually, not shift erratically due to isolated noise.
(4) Total Objective
4.6 Macro-Aware Training as Context, Not Filter
Critically: regimes are not used to "split models" but serve as context within a unified sequence.
Therefore
But the representation $u_t^a$ is conditioned on macro tokens in the sequence.
This enables post-hoc testing of:
- Which patterns persist across eras
- Which patterns are era-specific
Without discarding data.
4.7 Leakage Prevention — Formal Checklist
A model is considered invalid if any of the following occur:
- Future events are present in the input sequence
- Macro-events are defined using hindsight
- Price-derived features indirectly use future information
- Validation/test sets overlap with training regime data
We define a verification function
Applied in the pipeline before every training run.
4.8 Training as Representation Learning — Not Trading System
Research position: the learning objective is representation for pattern investigation, not performance optimization for trading.
Therefore, results are interpreted within this framework:
- Representation stability across eras
- Sequence-to-outcome relationships
- Not direct economic returns
4.9 Implementation View — For Development Teams
Training Pipeline
build causal sequence
→ encode tokens
→ transformer sequence encoder
→ attention pooling
→ output head
→ weighted/focal loss
→ add sparsity + temporal penalties
→ optimize
Validation Loop (time aware)
- Fixed forward-rolling windows
- Prohibit look-ahead
- Explicit macro-context in sequence
4.10 Connection to Next Section
This section established:
- Learning objectives
- Temporal constraints
- Imbalance handling
- The role of macro context in training
The next section covers: Regime-Conditioned Analysis & Post-Training Diagnostics (post-training pattern analysis, per-era behavior, explainability).