II. Event Representation & Embedding Framework
This section bridges raw data with a mathematical framework that enables model learning. The objectives are to:
- Support sparse, asynchronous, heterogeneous time events
- Integrate asset-level events and macro-level events within a unified sequence
- Provide a directly implementable system architecture
2.1 Formal Event Structure
A single event is represented as
where
- $t_i$ = event timestamp
- $x_i$ = event type (feature / indicator / macro tag)
- $v_i$ = quantitative value (Boolean, discrete, or continuous)
The complete event domain is defined as
Events may therefore be either:
- Company/stock-level signals (micro-structure / strategy / factor signals)
- Macroeconomic events (QE, QT, crisis flags, policy shocks, etc.)
2.2 Unified Event Sequence
For asset $a$, let there be an asset-specific event set
And a shared set of market-wide macroeconomic events
We construct a unified event sequence by
The result is a single sequence containing both:
- Events specific to that stock
- Macro events occurring within the same time period
Events are ordered chronologically, enabling the model to perceive continuous temporal relationships between events at both levels within a unified structure.
2.3 Time Encoding & Temporal Geometry
Since the data constitutes an asynchronous / irregular time series, inter-event intervals carry structural meaning. We define
And encode it as an embedding
Possible implementations include:
- Log-scale buckets
- Continuous projection layers
- Positional-style time kernels
The key insight is enabling the model to perceive the "tempo" of patterns, not merely their sequential order.
2.4 Event Token Representation
We define an event-to-vector transformation function
Decomposed into components
Component descriptions:
- type-emb — Indicates whether the token is an asset-event or macro-event
- feature-emb — Distinguishes signal types such as Feature-ID, Regime-ID
- value-proj — Handles Boolean / discrete / continuous values in a unified vector form
- $\tau_i$ — Encodes temporal interval meaning
This structure enables development teams to directly map real features to token embeddings.
2.5 Sparsity Awareness & Event Importance
Given the large number of features, yet expecting that only a subset carries causal significance in specific contexts, we introduce a feature-gating function
Applied as a scaling factor
With regularization to encourage sparsity
This does not force feature reduction, but allows the representation to gradually self-select important features.
2.6 Implementation-Ready View
At the system level, a single token can be viewed as JSON, for example:
{
"t": 1712001234,
"type": "asset_event",
"feature": "feature_X_217",
"value": 1,
"delta_t": 5400
}
Mapping through embedding layers as described above yields vector $e_i$, which is then fed into the sequence model.
In other words:
- Data engineering handles sequence construction
- ML model operates only on embeddings + sequence layers
2.7 Connection to Sequence Model
Given the vector sequence
The sequence model (e.g., Transformer / TCN) learns
This forms the foundation for:
- Learning pattern accumulation
- Performing regime-conditioned analysis downstream
The next section formally describes the backbone architecture and training objective.