Toto-Open-Base-1.0 / README.md
ben-cohen-datadog's picture
Update README.md
0ba857c verified
|
raw
history blame
6.68 kB
metadata
model_id: Toto-Open-Base-1.0
tags:
  - time-series-forecasting
  - foundation models
  - pretrained models
  - time series foundation models
  - time series
  - time-series
  - transformers
  - forecasting
  - safetensors
  - apache-2.0
paper:
  - - Link to Paper
datasets:
  - Salesforce/GiftEvalPretrain
  - autogluon/chronos_datasets
leaderboards:
  - GiftEval (if results are public)#TODO(Anna) check how to do that
  - BOOM (if results are public)#TODO(Anna) check how to do that
license: apache-2.0

Toto-Open-Base-1.0

Toto (Time Series Optimized Transformer for Observability) is a time-series foundation model designed for multi-variate time series forecasting with a focus on observability metrics. Toto leverages new architectural innovations and training recipes making it able to efficiently handle high-dimensional, sparse, and non-stationary time series that are hallmarks of the observability domain.

The model has been trained on a mixture of 2.36 trillion time series data points, 43% of which are data points taken from real-world observability metrics. Toto demonstrates state-of-the-art zero-shot performance on observability-specific tasks as well as a top ranking performance (as of DATE) on the multi-domain time series forecasting GiftEval benchmark.


model architecture Overview of the Toto-Open-Base-1.0 model architecture: Multivariate input time series of L steps are scaled using causal patch-based instance normalization, transformed into patch embeddings, and passed through a decoder-only transformer stack. The transformed features are unembedded and passed through a Student-T mixture model (Section: Probabilistic Prediction) which generates probabilistic next-patch predictions. B. The patch embedding takes as input a time series of M channels by L time steps. It divides the time dimension into patches of size P and projects these linearly into an embedding space of latent dimension D. This results in an output of size M × (L/P) × D which is fed to the transformer decoder. C. The transformer stack contains F identical segments. Each segment contains N time-wise transformer blocks followed by one channel-wise block.

Key Features - TODO develop those or remove them

  • Multi-Variate Time Series Support: using Proportional Factorized Space-Time Attention that efficiently groups multivariate features, reducing computational overhead while maintaining high accuracy.
  • Tailored for Observability: Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
  • Decoder-Only Transformer Architecture: supporting variable prediction horizons and lengths.
  • Causal Patch-Wise Instance Normalization: Improves forecasting performance and training stability in decoder-only models.
  • Student-T Mixture Head for Point & Probabilistic Forecasting: models complex, heavy-tailed distributions in observability data.
  • Extensive Pretraining on Large-Scale Data: Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets.
  • High-Dimensional Time Series Support: Efficiently handles datasets with a large number of variables.

Resources - TODO

  • Paper: "[Link to arxiv paper]"
  • Repository: "[Link to github repo]"
  • Blog Post: "[Link to Datadog BlogPost]"
  • BOOM: Dataset card

Usage

Installation

# TODO(Anna) - update these with correct instructions
# Clone the repository
git clone https://github.com/DataDogFutureOpenSource/TOTO.git

# Navigate to the project directory
cd foundation-models-research/toto

# Install the required dependencies
pip install -r requirements.txt

Running an Inference

For a step-by-step guide on running inferences with Toto, please refer to our GitHub repository's inference tutorial notebook.

Usage Recommendations - TODO remove or develop

Training Details - TODO keep or remove?

PreTraining Data

Dataset
GiftEval Pretrain
Chronos (Note: we use a subset of the Chronos dataset to avoid contamination with the GiftEval benchmark.)
Synthetic
Observability

For more details about the pretraining data and preprocessing steps, please refer to the paper or the GitHub repository.

Training Hyperparameters - TODO keep or remove?

The training hyperparameters for Toto are defined in the YAML configuration file located in our GitHub repository. You can find the configuration file here.

Results - TODO keep or remove?

Dataset CRPS MASE
BOOM TBD TBD
GiftEval TBD TBD

For more detailed information, please refer to the results section in our paper.

Citation - TODO

If you use Toto in your research or applications, please cite us using the following:

@article{Toto-Open-Base-1.0,
  title={TOTO: Time Series Optimized Transformer for Observability},
  author={Your Author Names Here},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2025},
  url={https://arxiv.org/abs/XXXX.XXXXX}
}