Datadog
/

Toto-Open-Base-1.0

@@ -27,7 +27,7 @@ license: apache-2.0
 Toto (Time Series Optimized Transformer for [Observability](https://en.wikipedia.org/wiki/Observability_(software))) is a time-series foundation model designed for multi-variate time series forecasting with a focus on observability metrics. Toto leverages new architectural innovations and training recipes making it able to efficiently handle high-dimensional, sparse, and non-stationary time series that are hallmarks of the observability domain.
-The model has been trained on a mixture of 2.36 trillion time series data points, 43% of which are data points taken from real-world observability metrics. Toto demonstrates state-of-the-art zero-shot performance on observability-specific tasks as well as a top ranking performance (as of DATE) on the multi-domain time series forecasting GiftEval benchmark.
 ---
@@ -47,18 +47,20 @@ The model has been trained on a mixture of 2.36 trillion time series data points
   </em>
 </div>
-### Key Features - TODO develop those or remove them
 - **Zero-Shot Forecasting**: Perform forecasting without fine-tuning on your specific time series
 - **Multi-Variate Support**: Efficiently process multiple variables using Proportional Factorized Space-Time Attention
 - **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and context lengths.
 - **Probabilistic Predictions**: Generate both point forecasts and uncertainty estimates using a Student-T mixture model
 - **Causal Patch-Wise Instance Normalization:**  Improves forecasting performance and training stability in decoder-only models.
-- **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets. Note: this open-source, open-weights model was training **without** any customer data.
 - **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
 - **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
 - **State-of-the-Art Performance**: Achieves top scores in benchmarks covering diverse time series forecasting tasks. This includes the established multi-domain benchmark [GiftEval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), as well as our own observability-focused benchmark [BOOM](https://huggingface.co/datasets/Datadog/BOOM).
 ### Model Weights
 Currently available checkpoints:
@@ -140,37 +142,23 @@ upper_quantile = forecast.quantile(0.9)  # 90th percentile for upper confidence
 For a step-by-step guide on running inferences with Toto, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDog/toto/blob/main/toto/notebooks/inference_tutorial.ipynb).
 #### Usage Recommendations
-<!-- TODO: Share best practices for maybe optimal context length, prediction length?. -->
 - For optimal inference speed install [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) and [flash-attention](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features)
 - If you're not using [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) or your system lacks a recent NVIDIA GPU, set `memory_efficient_attention=False` to ensure compatibility and stable inference.
 ## Training Details - TODO keep or remove?
-### PreTraining Data
 | Dataset                                                                          |
 |----------------------------------------------------------------------------------|
 | [GiftEval Pretrain](https://huggingface.co/datasets/Salesforce/GiftEvalPretrain) |
 | [Chronos](https://huggingface.co/datasets/autogluon/chronos_datasets) (Note: we use a subset of the Chronos dataset to avoid contamination with the GiftEval benchmark.)           |
 | Synthetic          |
-| Observability      |
 For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDog/toto.git).
-### Training Hyperparameters - TODO keep or remove?
-The training hyperparameters for Toto are defined in the YAML configuration file located in our GitHub repository. You can find the configuration file [here](https://github.com/DataDogFutureOpenSource/TOTO/blob/main/configs/toto_config.yaml).
-## Results - TODO keep or remove?
-| Dataset       | CRPS  | MASE  |
-|--------------------|-------|-------|
-| [BOOM](https://huggingface.co/datasets/Datadog/BOOM) | TBD   | TBD   |
-| [GiftEval](https://huggingface.co/datasets/Salesforce/GiftEval) | TBD   | TBD   |
-For more detailed information, please refer to the results section in our [paper](#TODO-Link-to-Paper).
 ## Citation - TODO

 Toto (Time Series Optimized Transformer for [Observability](https://en.wikipedia.org/wiki/Observability_(software))) is a time-series foundation model designed for multi-variate time series forecasting with a focus on observability metrics. Toto leverages new architectural innovations and training recipes making it able to efficiently handle high-dimensional, sparse, and non-stationary time series that are hallmarks of the observability domain.
+The model has been trained on a mixture of 2.36 trillion time series data points, 43% of which are data points taken from real-world observability metrics (note: no customer data was used to train this model). Toto demonstrates state-of-the-art zero-shot performance on observability-specific tasks as well as a top ranking performance (as of DATE) on the multi-domain time series forecasting GiftEval benchmark.
 ---
   </em>
 </div>
+### Key Features
 - **Zero-Shot Forecasting**: Perform forecasting without fine-tuning on your specific time series
 - **Multi-Variate Support**: Efficiently process multiple variables using Proportional Factorized Space-Time Attention
 - **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and context lengths.
 - **Probabilistic Predictions**: Generate both point forecasts and uncertainty estimates using a Student-T mixture model
 - **Causal Patch-Wise Instance Normalization:**  Improves forecasting performance and training stability in decoder-only models.
+- **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets.
 - **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
 - **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
 - **State-of-the-Art Performance**: Achieves top scores in benchmarks covering diverse time series forecasting tasks. This includes the established multi-domain benchmark [GiftEval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), as well as our own observability-focused benchmark [BOOM](https://huggingface.co/datasets/Datadog/BOOM).
+For more information on Toto, including an architecture deep-dive and detailed evaluation results, see our [paper](#TODO-link-to-paper).
 ### Model Weights
 Currently available checkpoints:
 For a step-by-step guide on running inferences with Toto, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDog/toto/blob/main/toto/notebooks/inference_tutorial.ipynb).
 #### Usage Recommendations
 - For optimal inference speed install [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) and [flash-attention](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features)
 - If you're not using [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) or your system lacks a recent NVIDIA GPU, set `memory_efficient_attention=False` to ensure compatibility and stable inference.
 ## Training Details - TODO keep or remove?
+### Pre-Training Data
 | Dataset                                                                          |
 |----------------------------------------------------------------------------------|
 | [GiftEval Pretrain](https://huggingface.co/datasets/Salesforce/GiftEvalPretrain) |
 | [Chronos](https://huggingface.co/datasets/autogluon/chronos_datasets) (Note: we use a subset of the Chronos dataset to avoid contamination with the GiftEval benchmark.)           |
 | Synthetic          |
+| Observability (**Note: No customer data was used in the training of this model**)      |
 For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDog/toto.git).
 ## Citation - TODO