responsive HTML
Browse files
README.md
CHANGED
@@ -31,12 +31,20 @@ The model has been trained on a mixture of 2.36 trillion time series data points
|
|
31 |
|
32 |
---
|
33 |
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
<
|
38 |
-
|
39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
</div>
|
41 |
|
42 |
## Key Features - TODO develop those or remove them
|
|
|
31 |
|
32 |
---
|
33 |
|
34 |
+
<div style="width: 80%; margin: auto; padding: 1rem;">
|
35 |
+
<img src="figures/architecture.png" alt="model architecture" style="width: 100%; height: auto;" />
|
36 |
+
<em style="display: block; margin-top: 0.5rem; text-align: center;">
|
37 |
+
<strong>Overview of the Toto-Open-Base-1.0 model architecture:</strong>
|
38 |
+
Multivariate input time series of <code>L</code> steps are scaled using causal patch-based instance normalization,
|
39 |
+
transformed into patch embeddings, and passed through a decoder-only transformer stack. The transformed features are unembedded
|
40 |
+
and passed through a Student-T mixture model (<em>Section: Probabilistic Prediction</em>) which generates probabilistic
|
41 |
+
next-patch predictions.
|
42 |
+
<strong>B.</strong> The patch embedding takes as input a time series of <code>M</code> channels by <code>L</code> time steps.
|
43 |
+
It divides the time dimension into patches of size <code>P</code> and projects these linearly into an embedding space of
|
44 |
+
latent dimension <code>D</code>. This results in an output of size <code>M × (L/P) × D</code> which is fed to the transformer decoder.
|
45 |
+
<strong>C.</strong> The transformer stack contains <code>F</code> identical segments. Each segment contains <code>N</code> time-wise
|
46 |
+
transformer blocks followed by one channel-wise block.
|
47 |
+
</em>
|
48 |
</div>
|
49 |
|
50 |
## Key Features - TODO develop those or remove them
|