prithivMLmods commited on
Commit
3e1ca13
·
verified ·
1 Parent(s): 128266a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -1
README.md CHANGED
@@ -4,6 +4,9 @@ datasets:
4
  - prithivMLmods/Multilabel-GeoSceneNet-16K
5
  library_name: transformers
6
  ---
 
 
 
7
 
8
  ```py
9
  Classification Report:
@@ -22,4 +25,93 @@ Buildings and Structures 0.8881 0.9498 0.9179 2190
22
  weighted avg 0.9253 0.9245 0.9244 16033
23
  ```
24
 
25
- ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Ld-vFb2MWg43wAG5pyFZb.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - prithivMLmods/Multilabel-GeoSceneNet-16K
5
  library_name: transformers
6
  ---
7
+ # **Multilabel-GeoSceneNet**
8
+
9
+ > **Multilabel-GeoSceneNet** is a vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for **multi-label** image classification. It is designed to recognize and label multiple geographic or environmental elements in a single image using the **SiglipForImageClassification** architecture.
10
 
11
  ```py
12
  Classification Report:
 
25
  weighted avg 0.9253 0.9245 0.9244 16033
26
  ```
27
 
28
+ ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Ld-vFb2MWg43wAG5pyFZb.png)
29
+
30
+ ---
31
+
32
+ The model predicts the presence of one or more of the following **7 geographic scene categories**:
33
+
34
+ ```
35
+ Class 0: "Buildings and Structures"
36
+ Class 1: "Desert"
37
+ Class 2: "Forest Area"
38
+ Class 3: "Hill or Mountain"
39
+ Class 4: "Ice Glacier"
40
+ Class 5: "Sea or Ocean"
41
+ Class 6: "Street View"
42
+ ```
43
+
44
+ ---
45
+
46
+ ## **Install dependencies**
47
+
48
+ ```python
49
+ !pip install -q transformers torch pillow gradio
50
+ ```
51
+
52
+ ---
53
+
54
+ ## **Inference Code**
55
+
56
+ ```python
57
+ import gradio as gr
58
+ from transformers import AutoImageProcessor, SiglipForImageClassification
59
+ from PIL import Image
60
+ import torch
61
+
62
+ # Load model and processor
63
+ model_name = "prithivMLmods/Multilabel-GeoSceneNet" # Updated model name
64
+ model = SiglipForImageClassification.from_pretrained(model_name)
65
+ processor = AutoImageProcessor.from_pretrained(model_name)
66
+
67
+ def classify_geoscene_image(image):
68
+ """Predicts geographic scene labels for an input image."""
69
+ image = Image.fromarray(image).convert("RGB")
70
+ inputs = processor(images=image, return_tensors="pt")
71
+
72
+ with torch.no_grad():
73
+ outputs = model(**inputs)
74
+ logits = outputs.logits
75
+ probs = torch.sigmoid(logits).squeeze().tolist() # Sigmoid for multilabel
76
+
77
+ labels = {
78
+ "0": "Buildings and Structures",
79
+ "1": "Desert",
80
+ "2": "Forest Area",
81
+ "3": "Hill or Mountain",
82
+ "4": "Ice Glacier",
83
+ "5": "Sea or Ocean",
84
+ "6": "Street View"
85
+ }
86
+
87
+ threshold = 0.5
88
+ predictions = {
89
+ labels[str(i)]: round(probs[i], 3)
90
+ for i in range(len(probs)) if probs[i] >= threshold
91
+ }
92
+
93
+ return predictions or {"None Detected": 0.0}
94
+
95
+ # Create Gradio interface
96
+ iface = gr.Interface(
97
+ fn=classify_geoscene_image,
98
+ inputs=gr.Image(type="numpy"),
99
+ outputs=gr.Label(label="Predicted Scene Categories"),
100
+ title="Multilabel-GeoSceneNet",
101
+ description="Upload an image to detect multiple geographic scene elements (e.g., forest, ocean, buildings)."
102
+ )
103
+
104
+ if __name__ == "__main__":
105
+ iface.launch()
106
+ ```
107
+
108
+ ---
109
+
110
+ ## **Intended Use:**
111
+
112
+ The **Multilabel-GeoSceneNet** model is suitable for recognizing multiple geographic and structural elements in a single image. Use cases include:
113
+
114
+ - **Remote Sensing:** Label elements in satellite or drone imagery.
115
+ - **Geographic Tagging:** Auto-tagging images for search or sorting.
116
+ - **Environmental Monitoring:** Identify features like glaciers or forests.
117
+ - **Scene Understanding:** Help autonomous systems interpret complex scenes.