Update README.md
Browse files
README.md
CHANGED
|
@@ -4,7 +4,6 @@ license_name: nvidia-open-model-license
|
|
| 4 |
license_link: https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
|
| 5 |
---
|
| 6 |
|
| 7 |
-
|
| 8 |
# Model Overview
|
| 9 |
|
| 10 |
## Description
|
|
@@ -68,6 +67,41 @@ Huggingface: 03/26/2025 via [RADIO Collection of Models](https://huggingface.co/
|
|
| 68 |
**Output Parameters:** 2D <br>
|
| 69 |
**Other Properties Related to Output:** Downstream model required to leverage image features <br>
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
## Software Integration
|
| 72 |
|
| 73 |
**Runtime Engine(s):**
|
|
@@ -192,4 +226,3 @@ Model Application(s): | Generation of visual embe
|
|
| 192 |
Describe the life critical impact (if present). | Not Applicable
|
| 193 |
Use Case Restrictions: | Abide by NVIDIA Open Model License Agreement
|
| 194 |
Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
|
| 195 |
-
|
|
|
|
| 4 |
license_link: https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
|
| 5 |
---
|
| 6 |
|
|
|
|
| 7 |
# Model Overview
|
| 8 |
|
| 9 |
## Description
|
|
|
|
| 67 |
**Output Parameters:** 2D <br>
|
| 68 |
**Other Properties Related to Output:** Downstream model required to leverage image features <br>
|
| 69 |
|
| 70 |
+
## Usage:
|
| 71 |
+
|
| 72 |
+
RADIO will return a tuple with two tensors.
|
| 73 |
+
The `summary` is similar to the `cls_token` in ViT and is meant to represent the general concept of the entire image.
|
| 74 |
+
It has shape `(B,C)` with `B` being the batch dimension, and `C` being some number of channels.
|
| 75 |
+
The `spatial_features` represent more localized content which should be suitable for dense tasks such as semantic segmentation, or for integration into an LLM.
|
| 76 |
+
|
| 77 |
+
```python
|
| 78 |
+
import torch
|
| 79 |
+
from PIL import Image
|
| 80 |
+
from transformers import AutoModel, CLIPImageProcessor
|
| 81 |
+
|
| 82 |
+
hf_repo = "nvidia/C-RADIOv2-B"
|
| 83 |
+
|
| 84 |
+
image_processor = CLIPImageProcessor.from_pretrained(hf_repo)
|
| 85 |
+
model = AutoModel.from_pretrained(hf_repo, trust_remote_code=True)
|
| 86 |
+
model.eval().cuda()
|
| 87 |
+
|
| 88 |
+
image = Image.open('./assets/radio.png').convert('RGB')
|
| 89 |
+
pixel_values = image_processor(images=image, return_tensors='pt', do_resize=True).pixel_values
|
| 90 |
+
pixel_values = pixel_values.cuda()
|
| 91 |
+
|
| 92 |
+
summary, features = model(pixel_values)
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
Spatial features have shape `(B,T,D)` with `T` being the flattened spatial tokens, and `D` being the channels for spatial features. Note that `C!=D` in general.
|
| 96 |
+
Converting to a spatial tensor format can be done using the downsampling size of the model, combined with the input tensor shape. For RADIO, the patch size is 16.
|
| 97 |
+
|
| 98 |
+
```Python
|
| 99 |
+
from einops import rearrange
|
| 100 |
+
spatial_features = rearrange(spatial_features, 'b (h w) d -> b d h w', h=x.shape[-2] // patch_size, w=x.shape[-1] // patch_size)
|
| 101 |
+
```
|
| 102 |
+
|
| 103 |
+
The resulting tensor will have shape `(B,D,H,W)`, as is typically seen with computer vision models.
|
| 104 |
+
|
| 105 |
## Software Integration
|
| 106 |
|
| 107 |
**Runtime Engine(s):**
|
|
|
|
| 226 |
Describe the life critical impact (if present). | Not Applicable
|
| 227 |
Use Case Restrictions: | Abide by NVIDIA Open Model License Agreement
|
| 228 |
Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
|
|
|