ben-cohen-datadog commited on
Commit
7bf285f
·
verified ·
1 Parent(s): 349d00a

minor tweaks

Browse files
Files changed (1) hide show
  1. README.md +7 -19
README.md CHANGED
@@ -27,7 +27,7 @@ license: apache-2.0
27
 
28
  Toto (Time Series Optimized Transformer for [Observability](https://en.wikipedia.org/wiki/Observability_(software))) is a time-series foundation model designed for multi-variate time series forecasting with a focus on observability metrics. Toto leverages new architectural innovations and training recipes making it able to efficiently handle high-dimensional, sparse, and non-stationary time series that are hallmarks of the observability domain.
29
 
30
- The model has been trained on a mixture of 2.36 trillion time series data points, 43% of which are data points taken from real-world observability metrics. Toto demonstrates state-of-the-art zero-shot performance on observability-specific tasks as well as a top ranking performance (as of DATE) on the multi-domain time series forecasting GiftEval benchmark.
31
 
32
  ---
33
 
@@ -47,18 +47,20 @@ The model has been trained on a mixture of 2.36 trillion time series data points
47
  </em>
48
  </div>
49
 
50
- ### Key Features - TODO develop those or remove them
51
 
52
  - **Zero-Shot Forecasting**: Perform forecasting without fine-tuning on your specific time series
53
  - **Multi-Variate Support**: Efficiently process multiple variables using Proportional Factorized Space-Time Attention
54
  - **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and context lengths.
55
  - **Probabilistic Predictions**: Generate both point forecasts and uncertainty estimates using a Student-T mixture model
56
  - **Causal Patch-Wise Instance Normalization:** Improves forecasting performance and training stability in decoder-only models.
57
- - **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets. Note: this open-source, open-weights model was training **without** any customer data.
58
  - **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
59
  - **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
60
  - **State-of-the-Art Performance**: Achieves top scores in benchmarks covering diverse time series forecasting tasks. This includes the established multi-domain benchmark [GiftEval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), as well as our own observability-focused benchmark [BOOM](https://huggingface.co/datasets/Datadog/BOOM).
61
 
 
 
62
  ### Model Weights
63
 
64
  Currently available checkpoints:
@@ -140,37 +142,23 @@ upper_quantile = forecast.quantile(0.9) # 90th percentile for upper confidence
140
  For a step-by-step guide on running inferences with Toto, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDog/toto/blob/main/toto/notebooks/inference_tutorial.ipynb).
141
 
142
  #### Usage Recommendations
143
- <!-- TODO: Share best practices for maybe optimal context length, prediction length?. -->
144
 
145
  - For optimal inference speed install [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) and [flash-attention](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features)
146
  - If you're not using [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) or your system lacks a recent NVIDIA GPU, set `memory_efficient_attention=False` to ensure compatibility and stable inference.
147
 
148
  ## Training Details - TODO keep or remove?
149
 
150
- ### PreTraining Data
151
  | Dataset |
152
  |----------------------------------------------------------------------------------|
153
  | [GiftEval Pretrain](https://huggingface.co/datasets/Salesforce/GiftEvalPretrain) |
154
  | [Chronos](https://huggingface.co/datasets/autogluon/chronos_datasets) (Note: we use a subset of the Chronos dataset to avoid contamination with the GiftEval benchmark.) |
155
  | Synthetic |
156
- | Observability |
157
 
158
 
159
  For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDog/toto.git).
160
 
161
- ### Training Hyperparameters - TODO keep or remove?
162
-
163
- The training hyperparameters for Toto are defined in the YAML configuration file located in our GitHub repository. You can find the configuration file [here](https://github.com/DataDogFutureOpenSource/TOTO/blob/main/configs/toto_config.yaml).
164
-
165
- ## Results - TODO keep or remove?
166
-
167
- | Dataset | CRPS | MASE |
168
- |--------------------|-------|-------|
169
- | [BOOM](https://huggingface.co/datasets/Datadog/BOOM) | TBD | TBD |
170
- | [GiftEval](https://huggingface.co/datasets/Salesforce/GiftEval) | TBD | TBD |
171
-
172
- For more detailed information, please refer to the results section in our [paper](#TODO-Link-to-Paper).
173
-
174
 
175
  ## Citation - TODO
176
 
 
27
 
28
  Toto (Time Series Optimized Transformer for [Observability](https://en.wikipedia.org/wiki/Observability_(software))) is a time-series foundation model designed for multi-variate time series forecasting with a focus on observability metrics. Toto leverages new architectural innovations and training recipes making it able to efficiently handle high-dimensional, sparse, and non-stationary time series that are hallmarks of the observability domain.
29
 
30
+ The model has been trained on a mixture of 2.36 trillion time series data points, 43% of which are data points taken from real-world observability metrics (note: no customer data was used to train this model). Toto demonstrates state-of-the-art zero-shot performance on observability-specific tasks as well as a top ranking performance (as of DATE) on the multi-domain time series forecasting GiftEval benchmark.
31
 
32
  ---
33
 
 
47
  </em>
48
  </div>
49
 
50
+ ### Key Features
51
 
52
  - **Zero-Shot Forecasting**: Perform forecasting without fine-tuning on your specific time series
53
  - **Multi-Variate Support**: Efficiently process multiple variables using Proportional Factorized Space-Time Attention
54
  - **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and context lengths.
55
  - **Probabilistic Predictions**: Generate both point forecasts and uncertainty estimates using a Student-T mixture model
56
  - **Causal Patch-Wise Instance Normalization:** Improves forecasting performance and training stability in decoder-only models.
57
+ - **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets.
58
  - **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
59
  - **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
60
  - **State-of-the-Art Performance**: Achieves top scores in benchmarks covering diverse time series forecasting tasks. This includes the established multi-domain benchmark [GiftEval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), as well as our own observability-focused benchmark [BOOM](https://huggingface.co/datasets/Datadog/BOOM).
61
 
62
+ For more information on Toto, including an architecture deep-dive and detailed evaluation results, see our [paper](#TODO-link-to-paper).
63
+
64
  ### Model Weights
65
 
66
  Currently available checkpoints:
 
142
  For a step-by-step guide on running inferences with Toto, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDog/toto/blob/main/toto/notebooks/inference_tutorial.ipynb).
143
 
144
  #### Usage Recommendations
 
145
 
146
  - For optimal inference speed install [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) and [flash-attention](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features)
147
  - If you're not using [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) or your system lacks a recent NVIDIA GPU, set `memory_efficient_attention=False` to ensure compatibility and stable inference.
148
 
149
  ## Training Details - TODO keep or remove?
150
 
151
+ ### Pre-Training Data
152
  | Dataset |
153
  |----------------------------------------------------------------------------------|
154
  | [GiftEval Pretrain](https://huggingface.co/datasets/Salesforce/GiftEvalPretrain) |
155
  | [Chronos](https://huggingface.co/datasets/autogluon/chronos_datasets) (Note: we use a subset of the Chronos dataset to avoid contamination with the GiftEval benchmark.) |
156
  | Synthetic |
157
+ | Observability (**Note: No customer data was used in the training of this model**) |
158
 
159
 
160
  For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDog/toto.git).
161
 
 
 
 
 
 
 
 
 
 
 
 
 
 
162
 
163
  ## Citation - TODO
164