model_id: Toto-Open-Base-1.0
tags:
- time-series-forecasting
- foundation models
- pretrained models
- time series foundation models
- time series
- time-series
- transformers
- forecasting
- safetensors
- apache-2.0
paper:
- - Link to Paper
datasets:
- Salesforce/GiftEvalPretrain
- autogluon/chronos_datasets
leaderboards:
- GiftEval (if results are public)#TODO(Anna) check how to do that
- BOOM (if results are public)#TODO(Anna) check how to do that
license: apache-2.0
Toto-Open-Base-1.0
Toto (Time Series Optimized Transformer for Observability) is a time-series foundation model designed for multi-variate time series forecasting with a focus on observability metrics. Toto leverages new architectural innovations and training recipes making it able to efficiently handle high-dimensional, sparse, and non-stationary time series that are hallmarks of the observability domain.
The model has been trained on a mixture of 2.36 trillion time series data points, 43% of which are data points taken from real-world observability metrics. Toto demonstrates state-of-the-art zero-shot performance on observability-specific tasks as well as a top ranking performance (as of DATE) on the multi-domain time series forecasting GiftEval benchmark.
Overview of the Toto-Open-Base-1.0 model architecture:
Multivariate input time series of L steps are scaled using causal patch-based instance normalization,
transformed into patch embeddings, and passed through a decoder-only transformer stack. The transformed features are unembedded
and passed through a Student-T mixture model (Section: Probabilistic Prediction) which generates probabilistic
next-patch predictions.
B. The patch embedding takes as input a time series of M channels by L time steps.
It divides the time dimension into patches of size P and projects these linearly into an embedding space of
latent dimension D. This results in an output of size M × (L/P) × D which is fed to the transformer decoder.
C. The transformer stack contains F identical segments. Each segment contains N time-wise
transformer blocks followed by one channel-wise block.
Key Features - TODO develop those or remove them
- Multi-Variate Time Series Support: using Proportional Factorized Space-Time Attention that efficiently groups multivariate features, reducing computational overhead while maintaining high accuracy.
- Tailored for Observability: Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
- Decoder-Only Transformer Architecture: supporting variable prediction horizons and lengths.
- Causal Patch-Wise Instance Normalization: Improves forecasting performance and training stability in decoder-only models.
- Student-T Mixture Head for Point & Probabilistic Forecasting: models complex, heavy-tailed distributions in observability data.
- Extensive Pretraining on Large-Scale Data: Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets.
- High-Dimensional Time Series Support: Efficiently handles datasets with a large number of variables.
Resources - TODO
- Paper: "[Link to arxiv paper]"
- Repository: "[Link to github repo]"
- Blog Post: "[Link to Datadog BlogPost]"
- BOOM: Dataset card
Usage
Installation
# TODO(Anna) - update these with correct instructions
# Clone the repository
git clone https://github.com/DataDogFutureOpenSource/TOTO.git
# Navigate to the project directory
cd foundation-models-research/toto
# Install the required dependencies
pip install -r requirements.txt
Running an Inference
For a step-by-step guide on running inferences with Toto, please refer to our GitHub repository's inference tutorial notebook.
Usage Recommendations - TODO remove or develop
Training Details - TODO keep or remove?
PreTraining Data
| Dataset |
|---|
| GiftEval Pretrain |
| Chronos (Note: we use a subset of the Chronos dataset to avoid contamination with the GiftEval benchmark.) |
| Synthetic |
| Observability |
For more details about the pretraining data and preprocessing steps, please refer to the paper or the GitHub repository.
Training Hyperparameters - TODO keep or remove?
The training hyperparameters for Toto are defined in the YAML configuration file located in our GitHub repository. You can find the configuration file here.
Results - TODO keep or remove?
For more detailed information, please refer to the results section in our paper.
Citation - TODO
If you use Toto in your research or applications, please cite us using the following:
@article{Toto-Open-Base-1.0,
title={TOTO: Time Series Optimized Transformer for Observability},
author={Your Author Names Here},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2025},
url={https://arxiv.org/abs/XXXX.XXXXX}
}