| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						model_id: Toto-Open-Base-1.0 | 
					
					
						
						| 
							 | 
						tags: | 
					
					
						
						| 
							 | 
						- time-series-forecasting | 
					
					
						
						| 
							 | 
						- foundation models | 
					
					
						
						| 
							 | 
						- pretrained models | 
					
					
						
						| 
							 | 
						- time series foundation models | 
					
					
						
						| 
							 | 
						- time series | 
					
					
						
						| 
							 | 
						- time-series | 
					
					
						
						| 
							 | 
						- transformers | 
					
					
						
						| 
							 | 
						- forecasting | 
					
					
						
						| 
							 | 
						- safetensors | 
					
					
						
						| 
							 | 
						- observability | 
					
					
						
						| 
							 | 
						paper: | 
					
					
						
						| 
							 | 
						- - Link to Paper | 
					
					
						
						| 
							 | 
						datasets: | 
					
					
						
						| 
							 | 
						- Salesforce/GiftEvalPretrain | 
					
					
						
						| 
							 | 
						- autogluon/chronos_datasets | 
					
					
						
						| 
							 | 
						leaderboards: | 
					
					
						
						| 
							 | 
						- GiftEval (if results are public)#TODO(Anna) check how to do that | 
					
					
						
						| 
							 | 
						- BOOM (if results are public)#TODO(Anna) check how to do that | 
					
					
						
						| 
							 | 
						license: apache-2.0 | 
					
					
						
						| 
							 | 
						pipeline_tag: time-series-forecasting | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						# Toto-Open-Base-1.0 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						Toto (Time Series Optimized Transformer for [Observability](https://www.datadoghq.com/knowledge-center/observability/) is a time-series foundation model designed for multi-variate time series forecasting, emphasizing observability metrics. Toto efficiently handles high-dimensional, sparse, and non-stationary data commonly encountered in observability scenarios. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						<div style="width: 100%; margin: auto; padding: 1rem;"> | 
					
					
						
						| 
							 | 
						  <img src="figures/architecture.png" alt="model architecture" style="width: 100%; height: auto;" /> | 
					
					
						
						| 
							 | 
						  <em style="display: block; margin-top: 0.5rem; text-align: center;"> | 
					
					
						
						| 
							 | 
						    Overview of Toto-Open-Base-1.0 architecture. | 
					
					
						
						| 
							 | 
						  </em> | 
					
					
						
						| 
							 | 
						</div> | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## ⚡ Quick Start: Model Inference | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						Inference code is available on [GitHub](https://github.com/DataDog/toto). | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### Installation | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```bash | 
					
					
						
						| 
							 | 
						# Clone the repository | 
					
					
						
						| 
							 | 
						git clone https://github.com/DataDog/toto.git | 
					
					
						
						| 
							 | 
						cd toto | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Install dependencies | 
					
					
						
						| 
							 | 
						pip install -r requirements.txt | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### 🚀 Inference Example | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						Here's how to quickly generate forecasts using Toto: | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```python | 
					
					
						
						| 
							 | 
						import torch | 
					
					
						
						| 
							 | 
						from data.util.dataset import MaskedTimeseries | 
					
					
						
						| 
							 | 
						from inference.forecaster import TotoForecaster | 
					
					
						
						| 
							 | 
						from model.toto import Toto | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						DEVICE = 'cuda' | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Load pre-trained Toto model | 
					
					
						
						| 
							 | 
						toto = Toto.from_pretrained('Datadog/Toto-Open-Base-1.0').to(DEVICE) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Optional: compile model for enhanced speed | 
					
					
						
						| 
							 | 
						toto.compile() | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						forecaster = TotoForecaster(toto.model) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Example input series (7 variables, 4096 timesteps) | 
					
					
						
						| 
							 | 
						input_series = torch.randn(7, 4096).to(DEVICE) | 
					
					
						
						| 
							 | 
						timestamp_seconds = torch.zeros(7, 4096).to(DEVICE) | 
					
					
						
						| 
							 | 
						time_interval_seconds = torch.full((7,), 60*15).to(DEVICE) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						inputs = MaskedTimeseries( | 
					
					
						
						| 
							 | 
						    series=input_series, | 
					
					
						
						| 
							 | 
						    padding_mask=torch.full_like(input_series, True, dtype=torch.bool), | 
					
					
						
						| 
							 | 
						    id_mask=torch.zeros_like(input_series), | 
					
					
						
						| 
							 | 
						    timestamp_seconds=timestamp_seconds, | 
					
					
						
						| 
							 | 
						    time_interval_seconds=time_interval_seconds, | 
					
					
						
						| 
							 | 
						) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Generate forecasts for next 336 timesteps | 
					
					
						
						| 
							 | 
						forecast = forecaster.forecast( | 
					
					
						
						| 
							 | 
						    inputs, | 
					
					
						
						| 
							 | 
						    prediction_length=336, | 
					
					
						
						| 
							 | 
						    num_samples=256, | 
					
					
						
						| 
							 | 
						    samples_per_batch=256, | 
					
					
						
						| 
							 | 
						) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Access results | 
					
					
						
						| 
							 | 
						mean_prediction = forecast.mean | 
					
					
						
						| 
							 | 
						prediction_samples = forecast.samples | 
					
					
						
						| 
							 | 
						lower_quantile = forecast.quantile(0.1) | 
					
					
						
						| 
							 | 
						upper_quantile = forecast.quantile(0.9) | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						For detailed inference instructions, refer to the [inference tutorial notebook](https://github.com/DataDog/toto/blob/main/toto/notebooks/inference_tutorial.ipynb). | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### Performance Recommendations | 
					
					
						
						| 
							 | 
						- ### **For optimal speed and reduced memory usage, install [xFormers](https://github.com/facebookresearch/xformers) and [flash-attention](https://github.com/Dao-AILab/flash-attention). Then, set `use_memory_efficient` to `True`.** | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### 💾 Available Checkpoints | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						| Checkpoint | Parameters | Config | Size | Notes | | 
					
					
						
						| 
							 | 
						|------------|------------|--------|------|-------| | 
					
					
						
						| 
							 | 
						| [Toto-Open-Base-1.0](https://huggingface.co/Datadog/Toto-Open-Base-1.0/blob/main/model.safetensors) | 151M | [Config](https://huggingface.co/Datadog/Toto-Open-Base-1.0/blob/main/config.json) | 605 MB | Initial release with SOTA performance | | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## ✨ Key Features | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						- **Zero-Shot Forecasting** | 
					
					
						
						| 
							 | 
						- **Multi-Variate Support** | 
					
					
						
						| 
							 | 
						- **Decoder-Only Transformer Architecture** | 
					
					
						
						| 
							 | 
						- **Probabilistic Predictions (Student-T mixture model)** | 
					
					
						
						| 
							 | 
						- **Causal Patch-Wise Instance Normalization** | 
					
					
						
						| 
							 | 
						- **Extensive Pretraining on Large-Scale Data** | 
					
					
						
						| 
							 | 
						- **High-Dimensional Time Series Support** | 
					
					
						
						| 
							 | 
						- **Tailored for Observability Metrics** | 
					
					
						
						| 
							 | 
						- **State-of-the-Art Performance** on [GiftEval](https://huggingface.co/spaces/Salesforce/GIFT-Eval) and [BOOM](https://huggingface.co/datasets/Datadog/BOOM) | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 📚 Training Data Summary | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						- **Observability Metrics:** ~1 trillion points from Datadog internal systems (no customer data) | 
					
					
						
						| 
							 | 
						- **Public Datasets:** | 
					
					
						
						| 
							 | 
						  - [GiftEval Pretrain](https://huggingface.co/datasets/Salesforce/GiftEvalPretrain) | 
					
					
						
						| 
							 | 
						  - [Chronos datasets](https://huggingface.co/datasets/autogluon/chronos_datasets) | 
					
					
						
						| 
							 | 
						- **Synthetic Data:** ~1/3 of training data | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 🔗 Additional Resources | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						- **Research Paper (To add)** | 
					
					
						
						| 
							 | 
						- **[GitHub Repository](https://github.com/DataDog/toto.git)** | 
					
					
						
						| 
							 | 
						- **[Blog Post](#TODO-link-to-blogpost)** | 
					
					
						
						| 
							 | 
						- **[BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM)** | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 📖 Citation | 
					
					
						
						| 
							 | 
						If you use Toto in your research or applications, please cite us using the following: | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```bibtex | 
					
					
						
						| 
							 | 
						@misc{toto2025, | 
					
					
						
						| 
							 | 
						  title={This Time is Different: An Observability Perspective on Time Series Foundation Models}, | 
					
					
						
						| 
							 | 
						  author={TODO}, | 
					
					
						
						| 
							 | 
						  year={2025}, | 
					
					
						
						| 
							 | 
						  eprint={arXiv:TODO}, | 
					
					
						
						| 
							 | 
						  archivePrefix={arXiv}, | 
					
					
						
						| 
							 | 
						  primaryClass={cs.LG} | 
					
					
						
						| 
							 | 
						} | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						``` |