metadata
model_id: Toto-Open-Base-1.0
tags:
- time-series-forecasting
- foundation models
- pretrained models
- time series foundation models
- time series
- time-series
- transformers
- forecasting
- safetensors
- observability
paper:
- - Link to Paper
datasets:
- Salesforce/GiftEvalPretrain
- autogluon/chronos_datasets
leaderboards:
- GiftEval (if results are public)#TODO(Anna) check how to do that
- BOOM (if results are public)#TODO(Anna) check how to do that
license: apache-2.0
pipeline_tag: time-series-forecasting
Toto-Open-Base-1.0
Toto (Time Series Optimized Transformer for Observability is a time-series foundation model designed for multi-variate time series forecasting, emphasizing observability metrics. Toto efficiently handles high-dimensional, sparse, and non-stationary data commonly encountered in observability scenarios.

⚡ Quick Start: Model Inference
Inference code is available on GitHub.
Installation
# Clone the repository
git clone https://github.com/DataDog/toto.git
cd toto
# Install dependencies
pip install -r requirements.txt
🚀 Inference Example
Here's how to quickly generate forecasts using Toto:
import torch
from data.util.dataset import MaskedTimeseries
from inference.forecaster import TotoForecaster
from model.toto import Toto
DEVICE = 'cuda'
# Load pre-trained Toto model
toto = Toto.from_pretrained('Datadog/Toto-Open-Base-1.0').to(DEVICE)
# Optional: compile model for enhanced speed
toto.compile()
forecaster = TotoForecaster(toto.model)
# Example input series (7 variables, 4096 timesteps)
input_series = torch.randn(7, 4096).to(DEVICE)
timestamp_seconds = torch.zeros(7, 4096).to(DEVICE)
time_interval_seconds = torch.full((7,), 60*15).to(DEVICE)
inputs = MaskedTimeseries(
series=input_series,
padding_mask=torch.full_like(input_series, True, dtype=torch.bool),
id_mask=torch.zeros_like(input_series),
timestamp_seconds=timestamp_seconds,
time_interval_seconds=time_interval_seconds,
)
# Generate forecasts for next 336 timesteps
forecast = forecaster.forecast(
inputs,
prediction_length=336,
num_samples=256,
samples_per_batch=256,
)
# Access results
mean_prediction = forecast.mean
prediction_samples = forecast.samples
lower_quantile = forecast.quantile(0.1)
upper_quantile = forecast.quantile(0.9)
For detailed inference instructions, refer to the inference tutorial notebook.
Performance Recommendations
For optimal speed and reduced memory usage, install xFormers and flash-attention. Then, set
use_memory_efficient
toTrue
.
💾 Available Checkpoints
Checkpoint | Parameters | Config | Size | Notes |
---|---|---|---|---|
Toto-Open-Base-1.0 | 151M | Config | 605 MB | Initial release with SOTA performance |
✨ Key Features
- Zero-Shot Forecasting
- Multi-Variate Support
- Decoder-Only Transformer Architecture
- Probabilistic Predictions (Student-T mixture model)
- Causal Patch-Wise Instance Normalization
- Extensive Pretraining on Large-Scale Data
- High-Dimensional Time Series Support
- Tailored for Observability Metrics
- State-of-the-Art Performance on GiftEval and BOOM
📚 Training Data Summary
- Observability Metrics: ~1 trillion points from Datadog internal systems (no customer data)
- Public Datasets:
- Synthetic Data: ~1/3 of training data
🔗 Additional Resources
- Research Paper (To add)
- GitHub Repository
- Blog Post
- BOOM Dataset
📖 Citation
If you use Toto in your research or applications, please cite us using the following:
@misc{toto2025,
title={This Time is Different: An Observability Perspective on Time Series Foundation Models},
author={TODO},
year={2025},
eprint={arXiv:TODO},
archivePrefix={arXiv},
primaryClass={cs.LG}
}