annamonica commited on
Commit
4ed4242
·
verified ·
1 Parent(s): 56b3c57

responsive HTML

Browse files
Files changed (1) hide show
  1. README.md +14 -6
README.md CHANGED
@@ -31,12 +31,20 @@ The model has been trained on a mixture of 2.36 trillion time series data points
31
 
32
  ---
33
 
34
-
35
- <!-- ![model architecture](figures/architecture.png) -->
36
- <!-- <img src="figures/architecture.png" alt="model architecture" width="920"/> -->
37
- <div style="text-align: center; width: 800px; margin: auto;">
38
- <img src="figures/architecture.png" alt="model architecture" style="width: 100%;"/>
39
- <em align="center" width="800"><strong>Overview of the Toto-Open-Base-1.0 model architecture:</strong> Multivariate input time series of <code>L</code> steps are scaled using causal patch-based instance normalization, transformed into patch embeddings, and passed through a decoder-only transformer stack. The transformed features are unembedded and passed through a Student-T mixture model (<em>Section: Probabilistic Prediction</em>) which generates probabilistic next-patch predictions. <strong>B.</strong> The patch embedding takes as input a time series of <code>M</code> channels by <code>L</code> time steps. It divides the time dimension into patches of size <code>P</code> and projects these linearly into an embedding space of latent dimension <code>D</code>. This results in an output of size <code>M × (L/P) × D</code> which is fed to the transformer decoder. <strong>C.</strong> The transformer stack contains <code>F</code> identical segments. Each segment contains <code>N</code> time-wise transformer blocks followed by one channel-wise block.</em>
 
 
 
 
 
 
 
 
40
  </div>
41
 
42
  ## Key Features - TODO develop those or remove them
 
31
 
32
  ---
33
 
34
+ <div style="width: 80%; margin: auto; padding: 1rem;">
35
+ <img src="figures/architecture.png" alt="model architecture" style="width: 100%; height: auto;" />
36
+ <em style="display: block; margin-top: 0.5rem; text-align: center;">
37
+ <strong>Overview of the Toto-Open-Base-1.0 model architecture:</strong>
38
+ Multivariate input time series of <code>L</code> steps are scaled using causal patch-based instance normalization,
39
+ transformed into patch embeddings, and passed through a decoder-only transformer stack. The transformed features are unembedded
40
+ and passed through a Student-T mixture model (<em>Section: Probabilistic Prediction</em>) which generates probabilistic
41
+ next-patch predictions.
42
+ <strong>B.</strong> The patch embedding takes as input a time series of <code>M</code> channels by <code>L</code> time steps.
43
+ It divides the time dimension into patches of size <code>P</code> and projects these linearly into an embedding space of
44
+ latent dimension <code>D</code>. This results in an output of size <code>M × (L/P) × D</code> which is fed to the transformer decoder.
45
+ <strong>C.</strong> The transformer stack contains <code>F</code> identical segments. Each segment contains <code>N</code> time-wise
46
+ transformer blocks followed by one channel-wise block.
47
+ </em>
48
  </div>
49
 
50
  ## Key Features - TODO develop those or remove them