Datadog
/

Toto-Open-Base-1.0

Time Series Forecasting

foundation models

pretrained models

time series foundation models

Model card Files Files and versions

annamonica commited on May 5

Commit

4ed4242

·

verified ·

1 Parent(s): 56b3c57

responsive HTML

Files changed (1) hide show

README.md +14 -6

README.md CHANGED Viewed

@@ -31,12 +31,20 @@ The model has been trained on a mixture of 2.36 trillion time series data points
 ---
-<!-- ![model architecture](figures/architecture.png) -->
-<!-- <img src="figures/architecture.png" alt="model architecture" width="920"/> -->
-<div style="text-align: center; width: 800px; margin: auto;">
-  <img src="figures/architecture.png" alt="model architecture" style="width: 100%;"/>
-  <em align="center" width="800"><strong>Overview of the Toto-Open-Base-1.0 model architecture:</strong> Multivariate input time series of <code>L</code> steps are scaled using causal patch-based instance normalization, transformed into patch embeddings, and passed through a decoder-only transformer stack. The transformed features are unembedded and passed through a Student-T mixture model (<em>Section: Probabilistic Prediction</em>) which generates probabilistic next-patch predictions. <strong>B.</strong> The patch embedding takes as input a time series of <code>M</code> channels by <code>L</code> time steps. It divides the time dimension into patches of size <code>P</code> and projects these linearly into an embedding space of latent dimension <code>D</code>. This results in an output of size <code>M × (L/P) × D</code> which is fed to the transformer decoder. <strong>C.</strong> The transformer stack contains <code>F</code> identical segments. Each segment contains <code>N</code> time-wise transformer blocks followed by one channel-wise block.</em>
 </div>
 ## Key Features - TODO develop those or remove them

 ---
+<div style="width: 80%; margin: auto; padding: 1rem;">
+  <img src="figures/architecture.png" alt="model architecture" style="width: 100%; height: auto;" />
+  <em style="display: block; margin-top: 0.5rem; text-align: center;">
+    <strong>Overview of the Toto-Open-Base-1.0 model architecture:</strong>
+    Multivariate input time series of <code>L</code> steps are scaled using causal patch-based instance normalization,
+    transformed into patch embeddings, and passed through a decoder-only transformer stack. The transformed features are unembedded
+    and passed through a Student-T mixture model (<em>Section: Probabilistic Prediction</em>) which generates probabilistic
+    next-patch predictions.
+    <strong>B.</strong> The patch embedding takes as input a time series of <code>M</code> channels by <code>L</code> time steps.
+    It divides the time dimension into patches of size <code>P</code> and projects these linearly into an embedding space of
+    latent dimension <code>D</code>. This results in an output of size <code>M × (L/P) × D</code> which is fed to the transformer decoder.
+    <strong>C.</strong> The transformer stack contains <code>F</code> identical segments. Each segment contains <code>N</code> time-wise
+    transformer blocks followed by one channel-wise block.
+  </em>
 </div>
 ## Key Features - TODO develop those or remove them