Datadog
/

Toto-Open-Base-1.0

@@ -47,20 +47,31 @@ The model has been trained on a mixture of 2.36 trillion time series data points
   </em>
 </div>
-## Key Features - TODO develop those or remove them
-<!-- TODO: Update this section to align with the introduction in the paper once finalized. -->
-- **Multi-Variate Time Series Support:** using **Proportional Factorized Space-Time Attention** that efficiently groups multivariate features, reducing computational overhead while maintaining high accuracy.
-- **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
-- **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and lengths.
 - **Causal Patch-Wise Instance Normalization:**  Improves forecasting performance and training stability in decoder-only models.
-- **Student-T Mixture Head for Point & Probabilistic Forecasting:** models complex, heavy-tailed distributions in observability data.
-- **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets.
 - **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
 ### Resources - TODO
 - **Paper:** "[Link to arxiv paper]"
-- **Repository:** "[Link to github repo]"
 - **Blog Post:** "[Link to Datadog BlogPost]"
 - **BOOM:** [Dataset card](https://huggingface.co/datasets/Datadog/BOOM)
@@ -68,24 +79,72 @@ The model has been trained on a mixture of 2.36 trillion time series data points
 ### Installation
 ```bash
-# TODO(Anna) - update these with correct instructions
 # Clone the repository
-git clone https://github.com/DataDogFutureOpenSource/TOTO.git
-# Navigate to the project directory
-cd foundation-models-research/toto
-# Install the required dependencies
 pip install -r requirements.txt
 ```
 ### Running an Inference
-For a step-by-step guide on running inferences with Toto, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDogFutureOpenSource/TOTO/XXX/notebooks/inference_tutorial.ipynb).
-### Usage Recommendations - TODO remove or develop
 <!-- TODO: Share best practices for maybe optimal context length, prediction length?. -->
 ## Training Details - TODO keep or remove?
 ### PreTraining Data
@@ -97,7 +156,7 @@ For a step-by-step guide on running inferences with Toto, please refer to our [G
 | Observability      |
-For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDogFutureOpenSource/TOTO).
 ### Training Hyperparameters - TODO keep or remove?

   </em>
 </div>
+### Key Features - TODO develop those or remove them
+- **Zero-Shot Forecasting**: Perform forecasting without fine-tuning on your specific time series
+- **Multi-Variate Support**: Efficiently process multiple variables using Proportional Factorized Space-Time Attention
+- **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and context lengths.
+- **Probabilistic Predictions**: Generate both point forecasts and uncertainty estimates using a Student-T mixture model
 - **Causal Patch-Wise Instance Normalization:**  Improves forecasting performance and training stability in decoder-only models.
+- **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets. Note: this open-source, open-weights model was training **without** any customer data.
 - **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
+- **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
+- **State-of-the-Art Performance**: Achieves top scores in benchmarks covering diverse time series forecasting tasks. This includes the established multi-domain benchmark [GiftEval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), as well as our own observability-focused benchmark [BOOM](https://huggingface.co/datasets/Datadog/BOOM).
+### Model Weights
+Currently available checkpoints:
+| Checkpoint | Parameters | Config | Notes |
+|------------|------------|--------|-------|
+| [Toto-Open-Base-1.0](https://huggingface.co/Datadog/Toto-Open-Base-1.0/blob/main/model.safetensors) | 151M | [Config](https://huggingface.co/Datadog/Toto-Open-Base-1.0/blob/main/config.json)| The initial open relase of Toto. Achieves state-of-the-art performance on both general-purpose and observability-focused benchmarking tasks, as described in our paper. |
 ### Resources - TODO
 - **Paper:** "[Link to arxiv paper]"
+- **Repository:** [Toto](https://github.com/DataDog/toto.git)
 - **Blog Post:** "[Link to Datadog BlogPost]"
 - **BOOM:** [Dataset card](https://huggingface.co/datasets/Datadog/BOOM)
 ### Installation
 ```bash
 # Clone the repository
+git clone https://github.com/DataDog/toto.git
+cd toto
+# Install dependencies
 pip install -r requirements.txt
 ```
 ### Running an Inference
+Here's a simple example to get you started with forecasting:
+```python
+import torch
+from data.util.dataset import MaskedTimeseries
+from inference.forecaster import TotoForecaster
+from model.toto import Toto
+# Load the pre-trained model
+toto = Toto.from_pretrained('Datadog/Toto-Open-Base-1.0')
+toto.to('cuda')  # Move to GPU
+# Optionally compile the model for faster inference
+toto.compile()  # Uses Torch's JIT compilation for better performance
+forecaster = TotoForecaster(toto.model)
+# Prepare your input time series (channels, time_steps)
+input_series = torch.randn(7, 4096).to('cuda')  # Example with 7 variables and 4096 timesteps
+# Prepare timestamp information (optional, but expected by API; not used by the current model release)
+timestamp_seconds = torch.zeros(7, 4096).to('cuda')
+time_interval_seconds = torch.full((7,), 60*15).to('cuda')  # 15-minute intervals
+# Create a MaskedTimeseries object
+inputs = MaskedTimeseries(
+    series=input_series,
+    padding_mask=torch.full_like(input_series, True, dtype=torch.bool),
+    id_mask=torch.zeros_like(input_series),
+    timestamp_seconds=timestamp_seconds,
+    time_interval_seconds=time_interval_seconds,
+)
+# Generate forecasts for the next 336 timesteps
+forecast = forecaster.forecast(
+    inputs,
+    prediction_length=336,
+    num_samples=256,  # Number of samples for probabilistic forecasting
+    samples_per_batch=256,  # Control memory usage during inference
+)
+# Access results
+mean_prediction = forecast.mean  # Point forecasts
+prediction_samples = forecast.samples  # Probabilistic samples
+lower_quantile = forecast.quantile(0.1)  # 10th percentile for lower confidence bound
+upper_quantile = forecast.quantile(0.9)  # 90th percentile for upper confidence bound
+```
+For a step-by-step guide on running inferences with Toto, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDog/toto/blob/main/toto/notebooks/inference_tutorial.ipynb).
+#### Usage Recommendations
 <!-- TODO: Share best practices for maybe optimal context length, prediction length?. -->
+- For optimal inference speed install [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) and [flash-attention](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features)
+- If you're not using [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) or your system lacks a recent NVIDIA GPU, set `memory_efficient_attention=False` to ensure compatibility and stable inference.
 ## Training Details - TODO keep or remove?
 ### PreTraining Data
 | Observability      |
+For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDog/toto.git).
 ### Training Hyperparameters - TODO keep or remove?