File size: 3,507 Bytes
92c214d
 
 
 
 
 
 
 
 
74f9a3d
34b7a7d
74f9a3d
b846b3f
74f9a3d
 
 
 
63ff6f1
 
 
 
 
 
 
 
 
e6d2378
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a064035
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f333cb5
 
 
 
 
 
a064035
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f333cb5
 
 
 
 
a064035
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
license: gpl-3.0
language:
- en
base_model:
- lmms-lab/llava-onevision-qwen2-7b-ov
pipeline_tag: image-text-to-text
tags:
- radioastronomy
---
# radiollava-7b-qa

https://arxiv.org/abs/2503.23859

radiollava is a domain-specialized vision-language AI assistant tailored for research in radioastronomy, in particular for running 
radio source analysis tasks on radio-continuum images. It was trained on ~1.5M user-assistant conversations relative to ~55k radio 
images taken from various radio surveys, including ASKAP-EMU, MeerKAT SMGPS and VLA FIRST. 

## Model Details

- **Base Architecture**: llava-onevision
- **Base Model**: llava-onevision-qwen2-7b-ov
- **Parameters**: 7 billion
- **Domain**: Radio Astronomy
- **License**: GPL 3.0 License
- **Development Process**: Supervised Fine-tuning (SFT) on QA pairs

## Using the model
To use this model, you need to install LLaVA-NeXT as described in this repository: 

`https://github.com/LLaVA-VL/LLaVA-NeXT`

LLaVA-NeXT requires an outdated version of the `transformers` library (v4.40.0).

To load the model:

```python
from llava.model.builder import load_pretrained_model

tokenizer, model, image_processor, max_length = load_pretrained_model(
  model_name_or_path="inaf-oact-ai/radiollava-7b-qa", 
  model_base=None, 
  model_name="llava_qwen", 
  device_map="auto"
)
```

To run model inference on an input image:

```python
import torch
from PIL import Image
from llava.model.builder import load_pretrained_model
from llava.mm_utils import process_images, tokenizer_image_token
from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN
from llava.conversation import conv_templates


# - Load model
tokenizer, model, image_processor, max_length = load_pretrained_model(
  model_name_or_path="inaf-oact-ai/radiollava-7b-qa", 
  model_base=None, 
  model_name="llava_qwen", 
  device_map="auto"
)

# - Load image
image_path= ...
image= Image.fromarray(data).convert("RGB")

# - Process image
image_tensor = process_images([image], image_processor, model.config)
image_tensor = [_image.to(dtype=torch.float16, device=model.device) for _image in image_tensor]

# - Create prompt
query= "Describe the input image"  # Replace it with your query
question = DEFAULT_IMAGE_TOKEN + "\n" + query
conv = copy.deepcopy(conv_templates[conv_template])
conv.system= '<|im_start|>system\nYou are an AI assistant specialized in radio astronomical topics.'
conv.append_message(conv.roles[0], question)
conv.append_message(conv.roles[1], None)
prompt_question = conv.get_prompt()

# - Create model inputs
input_ids = tokenizer_image_token(
  prompt_question,
  tokenizer,
  IMAGE_TOKEN_INDEX,
  return_tensors="pt"
).unsqueeze(0).to(model.device)
image_sizes = [image.size]

# - Generate model response
#   Change generation parameters as you wish
do_sample=True   
temperature= 0.3	
max_new_tokens=4096

output = model.generate(
  input_ids,
  images=image_tensor,
  image_sizes=image_sizes,
  do_sample=do_sample,
  temperature=temperature if do_sample else None,
  max_new_tokens=max_new_tokens,
)
output_parsed= tokenizer.decode(
  output[0],
  skip_special_tokens=True,
  clean_up_tokenization_spaces=False
)
	
# - Process response as you wish ...
#response= output_parsed.strip("\n").strip()
```

See the tutorials available in the LLaVA-NeXT repository:

`https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision_Tutorials.ipynb`

Further usage examples are provided in this repository:

`https://github.com/SKA-INAF/radio-llava.git`