Earth Observation
Foundation Model
Remote Sensing
File size: 12,208 Bytes
384ab53
 
 
 
 
 
 
 
 
 
 
dc6ae70
 
 
 
 
384ab53
3631173
dc6ae70
 
 
 
 
 
384ab53
dc6ae70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
384ab53
 
 
 
 
 
 
 
dc6ae70
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
license: apache-2.0
datasets:
- Major-TOM/Core-S2L2A
- Major-TOM/Core-S2L1C
- Major-TOM/Core-S1RTC
tags:
- Earth Observation
- Foundation Model
- Remote Sensing
---
# TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation
<p align="center">
    <img src="https://i.imgur.com/waxVImv.png" alt="Oryx TerraFM">
</p>

[![paper](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2506.06281)
[![code](https://img.shields.io/badge/GitHub-Code-blue.svg)](https://github.com/mbzuai-oryx/TerraFM)
[![Model Zoo](https://img.shields.io/badge/Model%20Zoo-HuggingFace-blue)](#🧠-model-zoo)

---

## 📢 Latest Updates
- **Jun-09-25**: 🚀 Initial release of **TerraFM codebase** and **pretrained models**
- **Jun-09-25**: 📄 Paper released on arXiv: [arxiv link](https://arxiv.org/abs/2506.06281). 🔥🔥

---

## 🌍 Overview

**TerraFM** is a scalable foundation model designed for unified processing of multisensor Earth Observation (EO) data. Built on a ViT backbone and trained over **18.7M tiles (~23T pixels)** from Sentinel-1 SAR and Sentinel-2 optical imagery, TerraFM unifies modality-specific inputs using:

- 🧩 Modality-specific patch embeddings  
- 🌀 Adaptive cross-attention fusion  
- 🎯 Dual-centering regularization for long-tailed distributions

TerraFM sets a new benchmark on **GEO-Bench** and **Copernicus-Bench**, demonstrating strong generalization across geographies, modalities, and tasks — including classification, segmentation, and landslide detection.

---


## 🔬 Key Features

<p align="center">
  <img src="images/spider_gb.jpg" alt="TerraFM Architecture" width="500"/>
</p>

- **Multimodal Pretraining**: Uses Sentinel-1 (SAR) and Sentinel-2 (L1C, L2A) as natural augmentations.
- **Large-Scale Dataset**: Trained on 18.7M global tiles from the [Major-TOM](https://huggingface.co/Major-TOM) dataset.
- **Cross-Attention Fusion**: Dynamically aggregates information across sensors at patch level.
- **Dual-Centering**: Mitigates long-tailed land cover bias using ESA WorldCover statistics.
- **Benchmark SOTA**: Outperforms prior FMs (Galileo, Prithvi, DOFA) across multiple EO tasks.

---
## 🧱 Architecture

<p align="center">
  <img src="images/arch.jpg" alt="TerraFM Architecture" width="700"/>
</p>

Overall architecture of TerraFM. It unifies student-teacher contrastive framework with modality augmentation with cross-attention fusion, and a new dual centering regularization. TerraFM is founded on ViT backbone and is trained on 18.7M globally distributed samples for pre-training and utilizes large-tile inputs for encoding broader spatial context. For illustration, RGB channels from S2-L2A and S2-L1C are selected, and S1 is visualized using a false-color RGB composite.

---
## 🧠 Model Zoo

| Model | Modality | Input Size | Backbone | Link |
|-------|----------|------------|--------|------|
| TerraFM-B | Sentinel-1 RTC + Sentinel-2 Level 2A + Sentinel-2 Level 1C | 224×224 | ViT-Base | [Download](https://huggingface.co/MBZUAI/TerraFM) |
| TerraFM-L | Sentinel-1 RTC + Sentinel-2 Level 2A + Sentinel-2 Level 1C | 224×224 | ViT-Large | [Download](https://huggingface.co/MBZUAI/TerraFM) |

---

## 🛠 Usage

TerraFM can be used directly via the `terrafm.py` module, which provides standalone implementations of the TerraFM-Base and TerraFM-Large models for easy integration into any codebase.

```python
from terrafm import terrafm_base, terrafm_large
import torch

# Simulated input: 1 sample, 12 channels, 224×224 resolution (e.g., Sentinel-2 L2A)
x = torch.randn(1, 12, 224, 224)

# Load TerraFM-Base model
model = terrafm_base()

# Load pretrained weights (e.g., TerraFM-B.pth)
state_dict = torch.load("TerraFM-B.pth", map_location="cpu")
msg = model.load_state_dict(state_dict, strict=False)

# Forward pass
y = model(x)
print(f"Output shape: {y.shape}")
```
---


## 📊 Results

### 🔍 k-NN Classification Results

We evaluate image classification using k-nearest neighbors (kNN) and report Top-1 accuracy for all single-label tasks. For the multilabel BigEarthNet benchmark, we report the F1 score.

| Model           | Backbone   | m-EuroSat (100%) | m-EuroSat (1%) | m-BigEarthNet (100%) | m-BigEarthNet (1%) | m-So2Sat (100%) | m-So2Sat (1%) | m-Brick-Kiln (100%) | m-Brick-Kiln (1%) |
|----------------|------------|------------------|----------------|------------------------|--------------------|------------------|----------------|----------------------|--------------------|
| SatMAE         | ViT-Base   | 84.1             | 34.8           | 50.6                   | 29.0               | 36.0             | 23.1           | 86.1                 | 73.5               |
| SatMAE++       | ViT-Large  | 82.7             | 48.5           | 50.8                   | 31.6               | 34.7             | 23.4           | 89.6                 | 76.7               |
| CROMA          | ViT-Base   | 85.6             | 51.3           | 58.8                   | 44.7               | 48.8             | 33.8           | 92.6                 | 85.1               |
| SoftCon        | ViT-Small  | 89.8             | 27.2           | 64.7                   | 43.3               | 51.1             | 31.4           | 89.2                 | 77.8               |
| DOFA           | ViT-Base   | 82.8             | 49.6           | 49.4                   | 29.9               | 41.4             | 29.4           | 88.3                 | 78.3               |
| Satlas         | Swin-Tiny  | 81.7             | 35.8           | 51.9                   | 29.6               | 36.6             | 27.1           | 88.2                 | 73.0               |
| MMEarth        | CNN-atto   | 81.7             | 30.0           | 58.3                   | 39.6               | 39.8             | 25.1           | 89.4                 | 79.7               |
| DeCUR          | ViT-Small  | 89.0             | 46.6           | 63.8                   | 49.6               | 45.8             | 30.9           | 83.7                 | 74.2               |
| AnySat         | ViT-Base   | 82.2             | 47.1           | 54.9                   | 33.7               | 39.8             | 29.0           | 85.3                 | 72.0               |
| Galileo        | ViT-Base   | 93.0             | 56.6           | 59.0                   | 36.5               | 54.8             | **43.2**       | 90.7                 | 78.0               |
| Prithvi-2.0    | ViT-Large  | 80.2             | 48.0           | 49.4                   | 28.8               | 29.5             | 26.1           | 87.9                 | 80.6               |
| Copernicus-FM  | ViT-Base   | 76.0             | 47.4           | 53.8                   | 33.3               | 38.4             | 23.3           | 93.0                 | 83.2               |
| **TerraFM**    | ViT-Base   | _94.2_           | _59.3_         | _68.7_                 | 49.4               | _55.1_           | _41.6_         | **94.5**             | **85.6**           |
|**TerraFM**| ViT-Large  | **95.1**         | **62.1**       | **69.4**               | **50.6**           | **55.9**         | 41.1           | _93.0_               | 82.2               |


### 🛰 Copernicus-Bench

Comparison of TerraFM with existing supervised and self-supervised methods on **Copernicus-Bench**.  
Metrics include **OA** (Overall Accuracy), **mAP** (mean Average Precision), and **mIoU** (mean Intersection over Union).

| Dataset         | Metric | Supervised | Random | SoftCon | CROMA | DOFA | Copernicus-FM | **TerraFM** |
|----------------|--------|------------|--------|---------|--------|------|----------------|-------------|
| **Backbone**    | --     | ViT-B/16   | ViT-B/16 | ViT-B/14 | ViT-B/8 | ViT-B/16 | ViT-B/16      | ViT-B/16    |
| **Cloud-S2**       | mIoU  | 59.4       | 60.4   | 66.9    | 65.0   | 65.0 | 66.7          | **67.9**    |
| **EuroSAT-S1**     | OA    | 81.5       | 75.4   | 83.6    | 83.9   | 81.7 | 87.2          | **87.8**    |
| **EuroSAT-S2**     | OA    | 97.6       | 92.5   | 96.7    | 97.0   | 97.2 | 97.9          | **99.1**    |
| **BigEarthNet-S1** | mAP   | 70.6       | 63.8   | **78.7**| 70.8   | 70.5 | 77.9          | 76.9        |
| **BigEarthNet-S2** | mAP   | 80.1       | 71.6   | 83.6    | 76.4   | 75.5 | 79.0          | **84.4**    |
| **DFC2020-S1**     | mIoU  | 50.8       | 45.4   | 52.8    | 52.7   | 49.7 | 52.4          | **55.4**    |
| **DFC2020-S2**     | mIoU  | 66.2       | 62.3   | 64.1    | **66.5**| 61.8 | 64.5          | 63.8        |
| **LCZ-S2**         | OA    | 85.3       | 77.4   | 83.6    | 84.1   | 83.0 | 84.4          | **87.0**    |

### 🧪 GEO-Bench Performance

Performance comparison on GEO-Bench for both **classification** (Top-1 Accuracy), **segmentation** (mIoU), and **F1 score** (for m-BigEarthNet).  
TerraFM achieves state-of-the-art results across multiple datasets, outperforming previous foundation models.

| Method       | Backbone   | m-EuroSat | m-BigEarthNet | m-So2Sat | m-Brick-Kiln | m-Cashew-Plant | m-SA-Crop-Type |
|--------------|------------|-----------|----------------|----------|----------------|------------------|------------------|
| SatMAE       | ViT-Large  | 96.6      | 68.3           | 57.2     | 98.4           | 30.8             | 24.8             |
| SatMAE++     | ViT-Large  | 96.5      | 67.9           | 56.0     | 98.6           | 29.6             | 25.7             |
| CROMA        | ViT-Large  | 96.6      | 71.9           | 60.6     | 98.7           | 31.8             | 32.0             |
| SoftCon      | ViT-Base   | 97.5      | 70.3           | 61.7     | 98.7           | 29.6             | 30.8             |
| DOFA         | ViT-Large  | 96.9      | 68.0           | 58.7     | 98.6           | 27.7             | 25.4             |
| Satlas       | Swin-Base  | 97.5      | 72.8           | 61.9     | **98.9**       | 25.1             | 23.4             |
| MMEarth      | CNN-atto   | 95.7      | 70.0           | 57.2     | 98.9           | 24.2             | 22.2             |
| DeCUR        | ViT-Small  | 97.9      | 70.9           | 61.7     | 98.7           | 26.2             | 21.5             |
| Prithvi 2.0  | ViT-Large  | 96.5      | 69.0           | 54.6     | 98.6           | 26.7             | 22.9             |
| AnySat       | ViT-Base   | 95.9      | 70.3           | 51.8     | 98.6           | 26.1             | 27.1             |
| Galileo      | ViT-Base   | 97.7      | 70.7           | 63.3     | 98.7           | 33.0             | 30.1             |
| **TerraFM**  | ViT-Base   | *98.1*    | 72.6           | *64.9*   | 98.7           | *34.1*           | *33.0*           |
| **TerraFM**  | ViT-Large  | **98.6**  | **73.1**       | **66.6** | **99.0**       | **37.2**         | **34.5**         |


### 🌋 Landslide Detection (Landslide4Sense)

Landslide detection performance on the **Landslide4Sense** test set.  
Despite having significantly fewer parameters (120M vs. 300M), **TerraFM** achieves higher overall segmentation performance, especially for landslide regions.
| Model                  | mIoU | IoU (Landslide) |
|------------------------|------|-----------------|
| Prithvi-EO-2.0 (300M)  | 65.0 | 31.5            |
| **TerraFM (120M)**     | **70.8** | **43.1**     |

<p align="center">
  <img src="images/ls4s_qual.jpg" alt="Landslide Detection" width="700"/>
</p>
---

## 📜 Citation
If you find our work and this repository useful, please consider giving our repo a star and citing our paper as follows:
```bibtex
@article{danish2025terrafmscalablefoundationmodel,
      title={TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation}, 
      author={Muhammad Sohail Danish and Muhammad Akhtar Munir and Syed Roshaan Ali Shah and Muhammad Haris Khan and Rao Muhammad Anwer and Jorma Laaksonen and Fahad Shahbaz Khan and Salman Khan},
      year={2025},
      eprint={2506.06281},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.06281}, 
}
```




## 📨 Contact
If you have any questions, please create an issue on this repository or contact at [email protected].