prithivMLmods
/

Multilabel-GeoSceneNet

@@ -4,6 +4,9 @@ datasets:
 - prithivMLmods/Multilabel-GeoSceneNet-16K
 library_name: transformers
 ---
 ```py
 Classification Report:
@@ -22,4 +25,93 @@ Buildings and Structures     0.8881    0.9498    0.9179      2190
             weighted avg     0.9253    0.9245    0.9244     16033
 ```
-![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Ld-vFb2MWg43wAG5pyFZb.png)

 - prithivMLmods/Multilabel-GeoSceneNet-16K
 library_name: transformers
 ---
+# **Multilabel-GeoSceneNet**
+> **Multilabel-GeoSceneNet** is a vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for **multi-label** image classification. It is designed to recognize and label multiple geographic or environmental elements in a single image using the **SiglipForImageClassification** architecture.
 ```py
 Classification Report:
             weighted avg     0.9253    0.9245    0.9244     16033
 ```
+![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Ld-vFb2MWg43wAG5pyFZb.png)
+---
+The model predicts the presence of one or more of the following **7 geographic scene categories**:
+```
+    Class 0: "Buildings and Structures"
+    Class 1: "Desert"
+    Class 2: "Forest Area"
+    Class 3: "Hill or Mountain"
+    Class 4: "Ice Glacier"
+    Class 5: "Sea or Ocean"
+    Class 6: "Street View"
+```
+---
+## **Install dependencies**
+```python
+!pip install -q transformers torch pillow gradio
+```
+---
+## **Inference Code**
+```python
+import gradio as gr
+from transformers import AutoImageProcessor, SiglipForImageClassification
+from PIL import Image
+import torch
+# Load model and processor
+model_name = "prithivMLmods/Multilabel-GeoSceneNet"  # Updated model name
+model = SiglipForImageClassification.from_pretrained(model_name)
+processor = AutoImageProcessor.from_pretrained(model_name)
+def classify_geoscene_image(image):
+    """Predicts geographic scene labels for an input image."""
+    image = Image.fromarray(image).convert("RGB")
+    inputs = processor(images=image, return_tensors="pt")
+    with torch.no_grad():
+        outputs = model(**inputs)
+        logits = outputs.logits
+        probs = torch.sigmoid(logits).squeeze().tolist()  # Sigmoid for multilabel
+    labels = {
+        "0": "Buildings and Structures",
+        "1": "Desert",
+        "2": "Forest Area",
+        "3": "Hill or Mountain",
+        "4": "Ice Glacier",
+        "5": "Sea or Ocean",
+        "6": "Street View"
+    }
+    threshold = 0.5
+    predictions = {
+        labels[str(i)]: round(probs[i], 3)
+        for i in range(len(probs)) if probs[i] >= threshold
+    }
+    return predictions or {"None Detected": 0.0}
+# Create Gradio interface
+iface = gr.Interface(
+    fn=classify_geoscene_image,
+    inputs=gr.Image(type="numpy"),
+    outputs=gr.Label(label="Predicted Scene Categories"),
+    title="Multilabel-GeoSceneNet",
+    description="Upload an image to detect multiple geographic scene elements (e.g., forest, ocean, buildings)."
+)
+if __name__ == "__main__":
+    iface.launch()
+```
+---
+## **Intended Use:**
+The **Multilabel-GeoSceneNet** model is suitable for recognizing multiple geographic and structural elements in a single image. Use cases include:
+- **Remote Sensing:** Label elements in satellite or drone imagery.
+- **Geographic Tagging:** Auto-tagging images for search or sorting.
+- **Environmental Monitoring:** Identify features like glaciers or forests.
+- **Scene Understanding:** Help autonomous systems interpret complex scenes.