annamonica commited on
Commit
349d00a
·
verified ·
1 Parent(s): 0ba857c

add model weights table, quick start code, link to github repo

Browse files
Files changed (1) hide show
  1. README.md +76 -17
README.md CHANGED
@@ -47,20 +47,31 @@ The model has been trained on a mixture of 2.36 trillion time series data points
47
  </em>
48
  </div>
49
 
50
- ## Key Features - TODO develop those or remove them
51
- <!-- TODO: Update this section to align with the introduction in the paper once finalized. -->
52
- - **Multi-Variate Time Series Support:** using **Proportional Factorized Space-Time Attention** that efficiently groups multivariate features, reducing computational overhead while maintaining high accuracy.
53
- - **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
54
- - **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and lengths.
 
55
  - **Causal Patch-Wise Instance Normalization:** Improves forecasting performance and training stability in decoder-only models.
56
- - **Student-T Mixture Head for Point & Probabilistic Forecasting:** models complex, heavy-tailed distributions in observability data.
57
- - **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets.
58
  - **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
 
 
 
 
 
 
 
 
 
 
 
59
 
60
  ### Resources - TODO
61
 
62
  - **Paper:** "[Link to arxiv paper]"
63
- - **Repository:** "[Link to github repo]"
64
  - **Blog Post:** "[Link to Datadog BlogPost]"
65
  - **BOOM:** [Dataset card](https://huggingface.co/datasets/Datadog/BOOM)
66
 
@@ -68,24 +79,72 @@ The model has been trained on a mixture of 2.36 trillion time series data points
68
  ### Installation
69
 
70
  ```bash
71
- # TODO(Anna) - update these with correct instructions
72
  # Clone the repository
73
- git clone https://github.com/DataDogFutureOpenSource/TOTO.git
74
-
75
- # Navigate to the project directory
76
- cd foundation-models-research/toto
77
 
78
- # Install the required dependencies
79
  pip install -r requirements.txt
80
  ```
81
 
82
  ### Running an Inference
83
 
84
- For a step-by-step guide on running inferences with Toto, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDogFutureOpenSource/TOTO/XXX/notebooks/inference_tutorial.ipynb).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
- ### Usage Recommendations - TODO remove or develop
 
 
87
  <!-- TODO: Share best practices for maybe optimal context length, prediction length?. -->
88
 
 
 
 
89
  ## Training Details - TODO keep or remove?
90
 
91
  ### PreTraining Data
@@ -97,7 +156,7 @@ For a step-by-step guide on running inferences with Toto, please refer to our [G
97
  | Observability |
98
 
99
 
100
- For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDogFutureOpenSource/TOTO).
101
 
102
  ### Training Hyperparameters - TODO keep or remove?
103
 
 
47
  </em>
48
  </div>
49
 
50
+ ### Key Features - TODO develop those or remove them
51
+
52
+ - **Zero-Shot Forecasting**: Perform forecasting without fine-tuning on your specific time series
53
+ - **Multi-Variate Support**: Efficiently process multiple variables using Proportional Factorized Space-Time Attention
54
+ - **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and context lengths.
55
+ - **Probabilistic Predictions**: Generate both point forecasts and uncertainty estimates using a Student-T mixture model
56
  - **Causal Patch-Wise Instance Normalization:** Improves forecasting performance and training stability in decoder-only models.
57
+ - **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets. Note: this open-source, open-weights model was training **without** any customer data.
 
58
  - **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
59
+ - **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
60
+ - **State-of-the-Art Performance**: Achieves top scores in benchmarks covering diverse time series forecasting tasks. This includes the established multi-domain benchmark [GiftEval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), as well as our own observability-focused benchmark [BOOM](https://huggingface.co/datasets/Datadog/BOOM).
61
+
62
+ ### Model Weights
63
+
64
+ Currently available checkpoints:
65
+
66
+ | Checkpoint | Parameters | Config | Notes |
67
+ |------------|------------|--------|-------|
68
+ | [Toto-Open-Base-1.0](https://huggingface.co/Datadog/Toto-Open-Base-1.0/blob/main/model.safetensors) | 151M | [Config](https://huggingface.co/Datadog/Toto-Open-Base-1.0/blob/main/config.json)| The initial open relase of Toto. Achieves state-of-the-art performance on both general-purpose and observability-focused benchmarking tasks, as described in our paper. |
69
+
70
 
71
  ### Resources - TODO
72
 
73
  - **Paper:** "[Link to arxiv paper]"
74
+ - **Repository:** [Toto](https://github.com/DataDog/toto.git)
75
  - **Blog Post:** "[Link to Datadog BlogPost]"
76
  - **BOOM:** [Dataset card](https://huggingface.co/datasets/Datadog/BOOM)
77
 
 
79
  ### Installation
80
 
81
  ```bash
 
82
  # Clone the repository
83
+ git clone https://github.com/DataDog/toto.git
84
+ cd toto
 
 
85
 
86
+ # Install dependencies
87
  pip install -r requirements.txt
88
  ```
89
 
90
  ### Running an Inference
91
 
92
+ Here's a simple example to get you started with forecasting:
93
+
94
+ ```python
95
+ import torch
96
+ from data.util.dataset import MaskedTimeseries
97
+ from inference.forecaster import TotoForecaster
98
+ from model.toto import Toto
99
+
100
+ # Load the pre-trained model
101
+ toto = Toto.from_pretrained('Datadog/Toto-Open-Base-1.0')
102
+ toto.to('cuda') # Move to GPU
103
+
104
+ # Optionally compile the model for faster inference
105
+ toto.compile() # Uses Torch's JIT compilation for better performance
106
+
107
+ forecaster = TotoForecaster(toto.model)
108
+
109
+ # Prepare your input time series (channels, time_steps)
110
+ input_series = torch.randn(7, 4096).to('cuda') # Example with 7 variables and 4096 timesteps
111
+
112
+ # Prepare timestamp information (optional, but expected by API; not used by the current model release)
113
+ timestamp_seconds = torch.zeros(7, 4096).to('cuda')
114
+ time_interval_seconds = torch.full((7,), 60*15).to('cuda') # 15-minute intervals
115
+
116
+ # Create a MaskedTimeseries object
117
+ inputs = MaskedTimeseries(
118
+ series=input_series,
119
+ padding_mask=torch.full_like(input_series, True, dtype=torch.bool),
120
+ id_mask=torch.zeros_like(input_series),
121
+ timestamp_seconds=timestamp_seconds,
122
+ time_interval_seconds=time_interval_seconds,
123
+ )
124
+
125
+ # Generate forecasts for the next 336 timesteps
126
+ forecast = forecaster.forecast(
127
+ inputs,
128
+ prediction_length=336,
129
+ num_samples=256, # Number of samples for probabilistic forecasting
130
+ samples_per_batch=256, # Control memory usage during inference
131
+ )
132
+
133
+ # Access results
134
+ mean_prediction = forecast.mean # Point forecasts
135
+ prediction_samples = forecast.samples # Probabilistic samples
136
+ lower_quantile = forecast.quantile(0.1) # 10th percentile for lower confidence bound
137
+ upper_quantile = forecast.quantile(0.9) # 90th percentile for upper confidence bound
138
+ ```
139
 
140
+ For a step-by-step guide on running inferences with Toto, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDog/toto/blob/main/toto/notebooks/inference_tutorial.ipynb).
141
+
142
+ #### Usage Recommendations
143
  <!-- TODO: Share best practices for maybe optimal context length, prediction length?. -->
144
 
145
+ - For optimal inference speed install [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) and [flash-attention](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features)
146
+ - If you're not using [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) or your system lacks a recent NVIDIA GPU, set `memory_efficient_attention=False` to ensure compatibility and stable inference.
147
+
148
  ## Training Details - TODO keep or remove?
149
 
150
  ### PreTraining Data
 
156
  | Observability |
157
 
158
 
159
+ For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDog/toto.git).
160
 
161
  ### Training Hyperparameters - TODO keep or remove?
162