Event-Driven Representation Learning in Sparse Financial Time Series

Abstract

This research presents a conceptual framework for learning representations from sparse event sequences in financial time series data. The approach integrates asset-level events with macro-level events within a unified sequence to investigate structural relationships between event sequence patterns and price outcomes across different market regimes. This work focuses on exploratory research rather than building operational price prediction systems.

I. Introduction & Problem Formulation

1.1 Motivation

Stock price behavior in financial markets does not arise from individual variables in isolation, but rather reflects event sequences that occur irregularly over time, including:

Corporate fundamental events
Strategy-based signals
Trading behavior indicators
Macroeconomic events such as QE, QT, crises, and policy shocks

The central hypothesis of this research is:

While precise prediction of stock price direction may not be achievable, certain accumulations of event patterns may correlate with subsequent price trends under appropriate contextual conditions (macro regimes).

The research objective is not operational price prediction, but rather to investigate whether structural patterns in event sequences can explain trend-level relationships in certain cases.

1.2 Data Model & Notation

Let the market contain a set of assets

$$\mathcal{A} = \{a_1, a_2, \dots, a_M\}$$

For any asset $a \in \mathcal{A}$, we define the event set over the time interval $[0,T]$ as a sequence

$$\mathcal{S}^a = \{(t_i^a, x_i^a)\}_{i=1}^{N_a}$$

where

$t_i^a \in \mathbb{R}^+$ = event timestamp
$x_i^a$ = event descriptor
The sequence need not have uniform time intervals (asynchronous / sparse)

We define two categories of events:

$$x_i^a \in \underbrace{\mathcal{X}_{asset}}_{\text{asset-level events}} \cup \underbrace{\mathcal{X}_{macro}}_{\text{macro-level events}}$$

All events are merged into a single sequence, ordered chronologically

$$t_1^a \le t_2^a \le \cdots \le t_{N_a}^a$$

1.3 Macro Events as Shared Tokens

Let there be a set of macroeconomic events (shared across all assets)

$$\mathcal{M} = \{(t_j^{macro}, m_j)\}_{j=1}^{K}$$

When constructing the sequence for asset $a$:

$$\tilde{\mathcal{S}}^a = \text{merge-sort}(\mathcal{S}^a, \mathcal{M})$$

This means:

Macro events are interleaved into the same sequence
They carry a distinct type identifier from asset events
They serve as structural context tokens

This enables the model to learn relationships between micro-level event sequences and macro-level context within a unified sequence.

1.4 Outcome Event Definition

We define an outcome function derived from stock prices

$$y^a = g(P^a(t))$$

For example

$$y^a = \begin{cases} 1, & \text{if return over } [t,t+\Delta] \ge \tau_{up}\\ -1, & \text{if return over } [t,t+\Delta] \le \tau_{down}\\ 0, & \text{otherwise} \end{cases}$$

This implies:

We do not attempt to predict exact prices
Instead, we define outcome events (price-movement events) as discrete sparse signals

The model's objective is not to predict every price movement, but to examine whether certain preceding event sequences tend to correlate with $y^a$.

1.5 Learning Objective

We define an event-to-embedding transformation

$$e_i = f_{\theta}(x_i, \Delta t_i)$$

And a sequence model

$$h = F_{\Theta}(e_1,\dots,e_n)$$

The model then estimates

$$\hat{p}(y^a \mid \tilde{\mathcal{S}}^a)$$

The training objective is to learn representations

$$\Theta^* = \arg\min_{\Theta} \mathcal{L}(y^a, \hat{p}(y^a \mid \tilde{\mathcal{S}}^a))$$

Under the hypothesis that meaningful patterns emerge from interactions between micro-events and macro-events rather than from individual variables.

1.6 Research Focus (Not a Prediction System)

This research focuses on

Determining whether learned representations exhibit "signals of event-based patterns"
Under the context of market regimes
Through the Train-Global → Analyze-Per-Era framework

The goal is not to build a trading model directly, but to explore whether structural event sequences carry statistical value.