minor tweaks
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ license: apache-2.0
|
|
27 |
|
28 |
Toto (Time Series Optimized Transformer for [Observability](https://en.wikipedia.org/wiki/Observability_(software))) is a time-series foundation model designed for multi-variate time series forecasting with a focus on observability metrics. Toto leverages new architectural innovations and training recipes making it able to efficiently handle high-dimensional, sparse, and non-stationary time series that are hallmarks of the observability domain.
|
29 |
|
30 |
-
The model has been trained on a mixture of 2.36 trillion time series data points, 43% of which are data points taken from real-world observability metrics. Toto demonstrates state-of-the-art zero-shot performance on observability-specific tasks as well as a top ranking performance (as of DATE) on the multi-domain time series forecasting GiftEval benchmark.
|
31 |
|
32 |
---
|
33 |
|
@@ -47,18 +47,20 @@ The model has been trained on a mixture of 2.36 trillion time series data points
|
|
47 |
</em>
|
48 |
</div>
|
49 |
|
50 |
-
### Key Features
|
51 |
|
52 |
- **Zero-Shot Forecasting**: Perform forecasting without fine-tuning on your specific time series
|
53 |
- **Multi-Variate Support**: Efficiently process multiple variables using Proportional Factorized Space-Time Attention
|
54 |
- **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and context lengths.
|
55 |
- **Probabilistic Predictions**: Generate both point forecasts and uncertainty estimates using a Student-T mixture model
|
56 |
- **Causal Patch-Wise Instance Normalization:** Improves forecasting performance and training stability in decoder-only models.
|
57 |
-
- **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets.
|
58 |
- **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
|
59 |
- **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
|
60 |
- **State-of-the-Art Performance**: Achieves top scores in benchmarks covering diverse time series forecasting tasks. This includes the established multi-domain benchmark [GiftEval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), as well as our own observability-focused benchmark [BOOM](https://huggingface.co/datasets/Datadog/BOOM).
|
61 |
|
|
|
|
|
62 |
### Model Weights
|
63 |
|
64 |
Currently available checkpoints:
|
@@ -140,37 +142,23 @@ upper_quantile = forecast.quantile(0.9) # 90th percentile for upper confidence
|
|
140 |
For a step-by-step guide on running inferences with Toto, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDog/toto/blob/main/toto/notebooks/inference_tutorial.ipynb).
|
141 |
|
142 |
#### Usage Recommendations
|
143 |
-
<!-- TODO: Share best practices for maybe optimal context length, prediction length?. -->
|
144 |
|
145 |
- For optimal inference speed install [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) and [flash-attention](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features)
|
146 |
- If you're not using [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) or your system lacks a recent NVIDIA GPU, set `memory_efficient_attention=False` to ensure compatibility and stable inference.
|
147 |
|
148 |
## Training Details - TODO keep or remove?
|
149 |
|
150 |
-
###
|
151 |
| Dataset |
|
152 |
|----------------------------------------------------------------------------------|
|
153 |
| [GiftEval Pretrain](https://huggingface.co/datasets/Salesforce/GiftEvalPretrain) |
|
154 |
| [Chronos](https://huggingface.co/datasets/autogluon/chronos_datasets) (Note: we use a subset of the Chronos dataset to avoid contamination with the GiftEval benchmark.) |
|
155 |
| Synthetic |
|
156 |
-
| Observability |
|
157 |
|
158 |
|
159 |
For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDog/toto.git).
|
160 |
|
161 |
-
### Training Hyperparameters - TODO keep or remove?
|
162 |
-
|
163 |
-
The training hyperparameters for Toto are defined in the YAML configuration file located in our GitHub repository. You can find the configuration file [here](https://github.com/DataDogFutureOpenSource/TOTO/blob/main/configs/toto_config.yaml).
|
164 |
-
|
165 |
-
## Results - TODO keep or remove?
|
166 |
-
|
167 |
-
| Dataset | CRPS | MASE |
|
168 |
-
|--------------------|-------|-------|
|
169 |
-
| [BOOM](https://huggingface.co/datasets/Datadog/BOOM) | TBD | TBD |
|
170 |
-
| [GiftEval](https://huggingface.co/datasets/Salesforce/GiftEval) | TBD | TBD |
|
171 |
-
|
172 |
-
For more detailed information, please refer to the results section in our [paper](#TODO-Link-to-Paper).
|
173 |
-
|
174 |
|
175 |
## Citation - TODO
|
176 |
|
|
|
27 |
|
28 |
Toto (Time Series Optimized Transformer for [Observability](https://en.wikipedia.org/wiki/Observability_(software))) is a time-series foundation model designed for multi-variate time series forecasting with a focus on observability metrics. Toto leverages new architectural innovations and training recipes making it able to efficiently handle high-dimensional, sparse, and non-stationary time series that are hallmarks of the observability domain.
|
29 |
|
30 |
+
The model has been trained on a mixture of 2.36 trillion time series data points, 43% of which are data points taken from real-world observability metrics (note: no customer data was used to train this model). Toto demonstrates state-of-the-art zero-shot performance on observability-specific tasks as well as a top ranking performance (as of DATE) on the multi-domain time series forecasting GiftEval benchmark.
|
31 |
|
32 |
---
|
33 |
|
|
|
47 |
</em>
|
48 |
</div>
|
49 |
|
50 |
+
### Key Features
|
51 |
|
52 |
- **Zero-Shot Forecasting**: Perform forecasting without fine-tuning on your specific time series
|
53 |
- **Multi-Variate Support**: Efficiently process multiple variables using Proportional Factorized Space-Time Attention
|
54 |
- **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and context lengths.
|
55 |
- **Probabilistic Predictions**: Generate both point forecasts and uncertainty estimates using a Student-T mixture model
|
56 |
- **Causal Patch-Wise Instance Normalization:** Improves forecasting performance and training stability in decoder-only models.
|
57 |
+
- **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets.
|
58 |
- **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
|
59 |
- **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
|
60 |
- **State-of-the-Art Performance**: Achieves top scores in benchmarks covering diverse time series forecasting tasks. This includes the established multi-domain benchmark [GiftEval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), as well as our own observability-focused benchmark [BOOM](https://huggingface.co/datasets/Datadog/BOOM).
|
61 |
|
62 |
+
For more information on Toto, including an architecture deep-dive and detailed evaluation results, see our [paper](#TODO-link-to-paper).
|
63 |
+
|
64 |
### Model Weights
|
65 |
|
66 |
Currently available checkpoints:
|
|
|
142 |
For a step-by-step guide on running inferences with Toto, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDog/toto/blob/main/toto/notebooks/inference_tutorial.ipynb).
|
143 |
|
144 |
#### Usage Recommendations
|
|
|
145 |
|
146 |
- For optimal inference speed install [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) and [flash-attention](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features)
|
147 |
- If you're not using [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) or your system lacks a recent NVIDIA GPU, set `memory_efficient_attention=False` to ensure compatibility and stable inference.
|
148 |
|
149 |
## Training Details - TODO keep or remove?
|
150 |
|
151 |
+
### Pre-Training Data
|
152 |
| Dataset |
|
153 |
|----------------------------------------------------------------------------------|
|
154 |
| [GiftEval Pretrain](https://huggingface.co/datasets/Salesforce/GiftEvalPretrain) |
|
155 |
| [Chronos](https://huggingface.co/datasets/autogluon/chronos_datasets) (Note: we use a subset of the Chronos dataset to avoid contamination with the GiftEval benchmark.) |
|
156 |
| Synthetic |
|
157 |
+
| Observability (**Note: No customer data was used in the training of this model**) |
|
158 |
|
159 |
|
160 |
For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDog/toto.git).
|
161 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
162 |
|
163 |
## Citation - TODO
|
164 |
|