Toto-Open-Base-1.0 / README.md
Emaad's picture
Update README.md
799306e verified
|
raw
history blame
5.03 kB
metadata
model_id: Toto-Open-Base-1.0
tags:
  - time-series-forecasting
  - foundation models
  - pretrained models
  - time series foundation models
  - time series
  - time-series
  - transformers
  - forecasting
  - safetensors
  - observability
paper:
  - - Link to Paper
datasets:
  - Salesforce/GiftEvalPretrain
  - autogluon/chronos_datasets
leaderboards:
  - GiftEval (if results are public)#TODO(Anna) check how to do that
  - BOOM (if results are public)#TODO(Anna) check how to do that
license: apache-2.0
pipeline_tag: time-series-forecasting

Toto-Open-Base-1.0

Toto (Time Series Optimized Transformer for Observability is a time-series foundation model designed for multi-variate time series forecasting, emphasizing observability metrics. Toto efficiently handles high-dimensional, sparse, and non-stationary data commonly encountered in observability scenarios.

model architecture Overview of Toto-Open-Base-1.0 architecture.

⚡ Quick Start: Model Inference

Inference code is available on GitHub.

Installation

# Clone the repository
git clone https://github.com/DataDog/toto.git
cd toto

# Install dependencies
pip install -r requirements.txt

🚀 Inference Example

Here's how to quickly generate forecasts using Toto:

import torch
from data.util.dataset import MaskedTimeseries
from inference.forecaster import TotoForecaster
from model.toto import Toto

DEVICE = 'cuda'

# Load pre-trained Toto model
toto = Toto.from_pretrained('Datadog/Toto-Open-Base-1.0').to(DEVICE)

# Optional: compile model for enhanced speed
toto.compile()

forecaster = TotoForecaster(toto.model)

# Example input series (7 variables, 4096 timesteps)
input_series = torch.randn(7, 4096).to(DEVICE)
timestamp_seconds = torch.zeros(7, 4096).to(DEVICE)
time_interval_seconds = torch.full((7,), 60*15).to(DEVICE)

inputs = MaskedTimeseries(
    series=input_series,
    padding_mask=torch.full_like(input_series, True, dtype=torch.bool),
    id_mask=torch.zeros_like(input_series),
    timestamp_seconds=timestamp_seconds,
    time_interval_seconds=time_interval_seconds,
)

# Generate forecasts for next 336 timesteps
forecast = forecaster.forecast(
    inputs,
    prediction_length=336,
    num_samples=256,
    samples_per_batch=256,
)

# Access results
mean_prediction = forecast.mean
prediction_samples = forecast.samples
lower_quantile = forecast.quantile(0.1)
upper_quantile = forecast.quantile(0.9)

For detailed inference instructions, refer to the inference tutorial notebook.

Performance Recommendations

  • For optimal speed and reduced memory usage, install xFormers and flash-attention. Then, set use_memory_efficient to True.


💾 Available Checkpoints

Checkpoint Parameters Config Size Notes
Toto-Open-Base-1.0 151M Config 605 MB Initial release with SOTA performance

✨ Key Features

  • Zero-Shot Forecasting
  • Multi-Variate Support
  • Decoder-Only Transformer Architecture
  • Probabilistic Predictions (Student-T mixture model)
  • Causal Patch-Wise Instance Normalization
  • Extensive Pretraining on Large-Scale Data
  • High-Dimensional Time Series Support
  • Tailored for Observability Metrics
  • State-of-the-Art Performance on GiftEval and BOOM

📚 Training Data Summary

  • Observability Metrics: ~1 trillion points from Datadog internal systems (no customer data)
  • Public Datasets:
  • Synthetic Data: ~1/3 of training data

🔗 Additional Resources


📖 Citation

If you use Toto in your research or applications, please cite us using the following:

@misc{toto2025,
  title={This Time is Different: An Observability Perspective on Time Series Foundation Models},
  author={TODO},
  year={2025},
  eprint={arXiv:TODO},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}