CLIPSeg Fine-tuned for Cloud Segmentation (Full Fine-Tuning, 100% Data)

Fine-tuned version of CIDAS/clipseg-rd64-refined for cloud segmentation on Sentinel-2 satellite imagery using the CloudSEN12+ dataset. This model is part of the research presented in:

Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift Harshith Kethavath, Weiming Hu EarthVision Workshop @ CVPR 2026

All models from this paper: https://huggingface.co/collections/uga-gaim/2026-cloudprompts

Model Description

CLIPSeg is a vision-language segmentation model trained on natural images. This variant is fully fine-tuned on CloudSEN12+ to adapt it to Sentinel-2 satellite imagery for four-class cloud segmentation: clear, thick cloud, thin cloud, and cloud shadow.

  • Developed by: Harshith Kethavath, Weiming Hu
  • Lab: Lab for Geoinformatics and AI Modeling (GAIM), University of Georgia
  • License: CC BY 4.0
  • Base model: CIDAS/clipseg-rd64-refined

How to Get Started

from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation
import torch
from PIL import Image

processor = CLIPSegProcessor.from_pretrained("uga-gaim/CLIPSeg-CloudSEN12Plus-FFT")
model = CLIPSegForImageSegmentation.from_pretrained("uga-gaim/CLIPSeg-CloudSEN12Plus-FFT")

image = Image.open("your_sentinel2_image.png")
prompts = ["clear", "thick cloud", "thin cloud", "cloud shadow"]

inputs = processor(
    text=prompts,
    images=[image] * len(prompts),
    return_tensors="pt",
    padding=True
)

with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits  # shape: (4, H, W)
predicted_class = logits.argmax(dim=0)  # per-pixel class prediction

Training Details

Training Data

Trained on the CloudSEN12+ dataset, the largest expert-labeled cloud segmentation benchmark for Sentinel-2 imagery. 100% of the training split was used (full data setting).

Training Hyperparameters

Hyperparameter Value
Optimizer AdamW
Learning rate 5e-5
Weight decay 0.02
Warmup ratio 0.06
Epochs 20
Batch size 16
Precision fp16

Loss Function

Combined segmentation loss: weighted sum of Focal loss, Tversky loss, and Boundary loss.

Evaluation Results

Evaluated on the CloudSEN12+ test split. Per-class IoU:

Class Zero-Shot (baseline) This model (FFT 100%)
Clear 0.5205 0.8540
Thick Cloud 0.2773 0.7875
Thin Cloud 0.0898 0.4702
Cloud Shadow 0.1325 0.5173
mIoU 0.2550 0.6572

Citation

@InProceedings{Kethavath_2026_CVPR,
    author    = {Kethavath, Harshith and Hu, Weiming},
    title     = {Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2026},
    pages     = {7960-7969}
}
Downloads last month
20
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for uga-gaim/CLIPSeg-CloudSEN12Plus-FFT

Finetuned
(4)
this model

Dataset used to train uga-gaim/CLIPSeg-CloudSEN12Plus-FFT

Collection including uga-gaim/CLIPSeg-CloudSEN12Plus-FFT

Free AI Image Generator No sign-up. Instant results. Open Now