add model weights table, quick start code, link to github repo
Browse files
README.md
CHANGED
@@ -47,20 +47,31 @@ The model has been trained on a mixture of 2.36 trillion time series data points
|
|
47 |
</em>
|
48 |
</div>
|
49 |
|
50 |
-
|
51 |
-
|
52 |
-
- **
|
53 |
-
- **
|
54 |
-
- **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and lengths.
|
|
|
55 |
- **Causal Patch-Wise Instance Normalization:** Improves forecasting performance and training stability in decoder-only models.
|
56 |
-
- **
|
57 |
-
- **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets.
|
58 |
- **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
|
60 |
### Resources - TODO
|
61 |
|
62 |
- **Paper:** "[Link to arxiv paper]"
|
63 |
-
- **Repository:**
|
64 |
- **Blog Post:** "[Link to Datadog BlogPost]"
|
65 |
- **BOOM:** [Dataset card](https://huggingface.co/datasets/Datadog/BOOM)
|
66 |
|
@@ -68,24 +79,72 @@ The model has been trained on a mixture of 2.36 trillion time series data points
|
|
68 |
### Installation
|
69 |
|
70 |
```bash
|
71 |
-
# TODO(Anna) - update these with correct instructions
|
72 |
# Clone the repository
|
73 |
-
git clone https://github.com/
|
74 |
-
|
75 |
-
# Navigate to the project directory
|
76 |
-
cd foundation-models-research/toto
|
77 |
|
78 |
-
# Install
|
79 |
pip install -r requirements.txt
|
80 |
```
|
81 |
|
82 |
### Running an Inference
|
83 |
|
84 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
85 |
|
86 |
-
|
|
|
|
|
87 |
<!-- TODO: Share best practices for maybe optimal context length, prediction length?. -->
|
88 |
|
|
|
|
|
|
|
89 |
## Training Details - TODO keep or remove?
|
90 |
|
91 |
### PreTraining Data
|
@@ -97,7 +156,7 @@ For a step-by-step guide on running inferences with Toto, please refer to our [G
|
|
97 |
| Observability |
|
98 |
|
99 |
|
100 |
-
For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/
|
101 |
|
102 |
### Training Hyperparameters - TODO keep or remove?
|
103 |
|
|
|
47 |
</em>
|
48 |
</div>
|
49 |
|
50 |
+
### Key Features - TODO develop those or remove them
|
51 |
+
|
52 |
+
- **Zero-Shot Forecasting**: Perform forecasting without fine-tuning on your specific time series
|
53 |
+
- **Multi-Variate Support**: Efficiently process multiple variables using Proportional Factorized Space-Time Attention
|
54 |
+
- **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and context lengths.
|
55 |
+
- **Probabilistic Predictions**: Generate both point forecasts and uncertainty estimates using a Student-T mixture model
|
56 |
- **Causal Patch-Wise Instance Normalization:** Improves forecasting performance and training stability in decoder-only models.
|
57 |
+
- **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets. Note: this open-source, open-weights model was training **without** any customer data.
|
|
|
58 |
- **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
|
59 |
+
- **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
|
60 |
+
- **State-of-the-Art Performance**: Achieves top scores in benchmarks covering diverse time series forecasting tasks. This includes the established multi-domain benchmark [GiftEval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), as well as our own observability-focused benchmark [BOOM](https://huggingface.co/datasets/Datadog/BOOM).
|
61 |
+
|
62 |
+
### Model Weights
|
63 |
+
|
64 |
+
Currently available checkpoints:
|
65 |
+
|
66 |
+
| Checkpoint | Parameters | Config | Notes |
|
67 |
+
|------------|------------|--------|-------|
|
68 |
+
| [Toto-Open-Base-1.0](https://huggingface.co/Datadog/Toto-Open-Base-1.0/blob/main/model.safetensors) | 151M | [Config](https://huggingface.co/Datadog/Toto-Open-Base-1.0/blob/main/config.json)| The initial open relase of Toto. Achieves state-of-the-art performance on both general-purpose and observability-focused benchmarking tasks, as described in our paper. |
|
69 |
+
|
70 |
|
71 |
### Resources - TODO
|
72 |
|
73 |
- **Paper:** "[Link to arxiv paper]"
|
74 |
+
- **Repository:** [Toto](https://github.com/DataDog/toto.git)
|
75 |
- **Blog Post:** "[Link to Datadog BlogPost]"
|
76 |
- **BOOM:** [Dataset card](https://huggingface.co/datasets/Datadog/BOOM)
|
77 |
|
|
|
79 |
### Installation
|
80 |
|
81 |
```bash
|
|
|
82 |
# Clone the repository
|
83 |
+
git clone https://github.com/DataDog/toto.git
|
84 |
+
cd toto
|
|
|
|
|
85 |
|
86 |
+
# Install dependencies
|
87 |
pip install -r requirements.txt
|
88 |
```
|
89 |
|
90 |
### Running an Inference
|
91 |
|
92 |
+
Here's a simple example to get you started with forecasting:
|
93 |
+
|
94 |
+
```python
|
95 |
+
import torch
|
96 |
+
from data.util.dataset import MaskedTimeseries
|
97 |
+
from inference.forecaster import TotoForecaster
|
98 |
+
from model.toto import Toto
|
99 |
+
|
100 |
+
# Load the pre-trained model
|
101 |
+
toto = Toto.from_pretrained('Datadog/Toto-Open-Base-1.0')
|
102 |
+
toto.to('cuda') # Move to GPU
|
103 |
+
|
104 |
+
# Optionally compile the model for faster inference
|
105 |
+
toto.compile() # Uses Torch's JIT compilation for better performance
|
106 |
+
|
107 |
+
forecaster = TotoForecaster(toto.model)
|
108 |
+
|
109 |
+
# Prepare your input time series (channels, time_steps)
|
110 |
+
input_series = torch.randn(7, 4096).to('cuda') # Example with 7 variables and 4096 timesteps
|
111 |
+
|
112 |
+
# Prepare timestamp information (optional, but expected by API; not used by the current model release)
|
113 |
+
timestamp_seconds = torch.zeros(7, 4096).to('cuda')
|
114 |
+
time_interval_seconds = torch.full((7,), 60*15).to('cuda') # 15-minute intervals
|
115 |
+
|
116 |
+
# Create a MaskedTimeseries object
|
117 |
+
inputs = MaskedTimeseries(
|
118 |
+
series=input_series,
|
119 |
+
padding_mask=torch.full_like(input_series, True, dtype=torch.bool),
|
120 |
+
id_mask=torch.zeros_like(input_series),
|
121 |
+
timestamp_seconds=timestamp_seconds,
|
122 |
+
time_interval_seconds=time_interval_seconds,
|
123 |
+
)
|
124 |
+
|
125 |
+
# Generate forecasts for the next 336 timesteps
|
126 |
+
forecast = forecaster.forecast(
|
127 |
+
inputs,
|
128 |
+
prediction_length=336,
|
129 |
+
num_samples=256, # Number of samples for probabilistic forecasting
|
130 |
+
samples_per_batch=256, # Control memory usage during inference
|
131 |
+
)
|
132 |
+
|
133 |
+
# Access results
|
134 |
+
mean_prediction = forecast.mean # Point forecasts
|
135 |
+
prediction_samples = forecast.samples # Probabilistic samples
|
136 |
+
lower_quantile = forecast.quantile(0.1) # 10th percentile for lower confidence bound
|
137 |
+
upper_quantile = forecast.quantile(0.9) # 90th percentile for upper confidence bound
|
138 |
+
```
|
139 |
|
140 |
+
For a step-by-step guide on running inferences with Toto, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDog/toto/blob/main/toto/notebooks/inference_tutorial.ipynb).
|
141 |
+
|
142 |
+
#### Usage Recommendations
|
143 |
<!-- TODO: Share best practices for maybe optimal context length, prediction length?. -->
|
144 |
|
145 |
+
- For optimal inference speed install [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) and [flash-attention](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features)
|
146 |
+
- If you're not using [xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers) or your system lacks a recent NVIDIA GPU, set `memory_efficient_attention=False` to ensure compatibility and stable inference.
|
147 |
+
|
148 |
## Training Details - TODO keep or remove?
|
149 |
|
150 |
### PreTraining Data
|
|
|
156 |
| Observability |
|
157 |
|
158 |
|
159 |
+
For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDog/toto.git).
|
160 |
|
161 |
### Training Hyperparameters - TODO keep or remove?
|
162 |
|