VI. Experimental Design, Dataset Construction & Reproducibility Framework
This section establishes the methodological standards for the entire research:
- How data is prepared
- How labels and event sequences are constructed
- How data partitioning and time periods are designed
- How experiments are conducted to ensure unbiased and reproducible results
Formally: this section constitutes the "research protocol" ensuring that results derive from transparent, reproducible processes rather than trial-and-error tuning.
6.1 Dataset Definition — Multi-Asset Event-Time Panel
Let there be a set of assets
And a data time range
For asset $a$, the unified event set (from Section 2):
With discrete-time prices
The complete system dataset
Structurally:
- This is event-time panel data
- Not based on uniform-interval sampling
- Emphasizing the "event → sequence → outcome" structure
6.2 Event Construction Protocol (Asset & Macro Levels)
6.2.1 Asset-Level Event Extraction
Events derive from various feature types:
- Boolean condition triggers
- Factor-state transitions
- Indicator crossing events
- Structural / fundamental signals
Define the event generator
Requirements:
- Events must derive from information known at that time
- Back-filled data is prohibited
- Timestamps must strictly precede outcomes
6.2.2 Macro-Level Event Definition
The macro event set:
Must be defined according to ex-ante observable rules, such as:
- Official QE start dates from announcements
- Interest rate change dates
- Shock dates recorded in public sources
Hindsight definitions are prohibited, such as "retrospectively considering this period a crisis."
Source documentation must be specified and definitions frozen before training begins.
6.2.3 Unified Event Merge Procedure
The sequence merge process (for asset $a$):
Enforced invariants:
- Non-decreasing time: $t_1^a \le t_2^a \le \dots$
- Identical macro events across all assets
- No retroactive event creation
6.3 Outcome Label Construction
Let future returns be
Define the outcome function
Threshold-based example:
Critical requirements:
- $y_t^a$ must use only prices in the interval $[t, t+\Delta]$
- Median / forward fill across future periods is prohibited
- $\tau$ and $\Delta$ must be specified before experiments and locked
6.4 Causal-Safe Training Window Construction
For each sample at time $t$
That is
Rolling retrospective windows that consume future data are prohibited.
6.5 Temporal Splitting & Forward Evaluation Protocol
To maintain time-causal validity, partition into:
- Train
- Validation
- Test
With
And rolling-forward evaluation:
Benefits:
- Verification of performance stability over time
- Reduced risk of selecting biased periods
6.6 Experimental Arms (What We Compare Against)
For meaningful research results, comparable baselines are required.
6.6.1 Baseline Models
- Random / Majority baseline
- Logistic regression on aggregated features
- GRU / LSTM (no macro tokens)
- Transformer without macro tokens
- Transformer with macro tokens (proposed)
Objective: not to "beat" baselines, but to demonstrate that incorporating "event sequences + macro context" provides structural informational value.
6.7 Hyperparameter Governance (Pre-Analysis Rule)
To avoid tuning bias, specify pre-registered ranges
Examples:
Final model selection:
- Based on time-split validation
- Retrospective selection from test set is prohibited
6.8 Reproducibility Requirements
Research is considered reproducible when it includes:
- Versioned dataset recipe (raw data need not be released, but construction formula must be)
- Config-locked experiment files (e.g., YAML / JSON)
- Recorded commit hash, random seed, hyperparameters, training logs
- Consistency verification functions such as
Confirming sequences remain unchanged between runs.
6.9 Error & Risk Audit — What Can Go Wrong
For transparency, potential risks must be assessed:
- Regime mis-labeling
- Survivorship bias in stock universe
- Corporate actions causing price jumps
- Missing-event distortion
- Redundancy of correlated events
All items should be documented in a risk appendix.
6.10 Interpretation Scope & Ethical Boundary
This document specifies that:
- Results are for structural research purposes
- Not to be interpreted as profit prediction tools
- No direct causal claims are made
- This is pattern-relation research only
6.11 Connection to Final Sections
With Section 6 establishing foundations for dataset, experiment protocol, and reproducibility:
The next section addresses Limitations, Extensions & Future Research Directions (scope limitations, extension frameworks, and future research directions).