|
--- |
|
license: gpl-3.0 |
|
language: |
|
- en |
|
base_model: |
|
- lmms-lab/llava-onevision-qwen2-7b-ov |
|
pipeline_tag: image-text-to-text |
|
tags: |
|
- radioastronomy |
|
--- |
|
# radiollava-7b-qa |
|
|
|
https://arxiv.org/abs/2503.23859 |
|
|
|
radiollava is a domain-specialized vision-language AI assistant tailored for research in radioastronomy, in particular for running |
|
radio source analysis tasks on radio-continuum images. It was trained on ~1.5M user-assistant conversations relative to ~55k radio |
|
images taken from various radio surveys, including ASKAP-EMU, MeerKAT SMGPS and VLA FIRST. |
|
|
|
## Model Details |
|
|
|
- **Base Architecture**: llava-onevision |
|
- **Base Model**: llava-onevision-qwen2-7b-ov |
|
- **Parameters**: 7 billion |
|
- **Domain**: Radio Astronomy |
|
- **License**: GPL 3.0 License |
|
- **Development Process**: Supervised Fine-tuning (SFT) on QA pairs |
|
|
|
## Using the model |
|
To use this model, you need to install LLaVA-NeXT as described in this repository: |
|
|
|
`https://github.com/LLaVA-VL/LLaVA-NeXT` |
|
|
|
LLaVA-NeXT requires an outdated version of the `transformers` library (v4.40.0). |
|
|
|
To load the model: |
|
|
|
```python |
|
from llava.model.builder import load_pretrained_model |
|
|
|
tokenizer, model, image_processor, max_length = load_pretrained_model( |
|
model_name_or_path="inaf-oact-ai/radiollava-7b-qa", |
|
model_base=None, |
|
model_name="llava_qwen", |
|
device_map="auto" |
|
) |
|
``` |
|
|
|
To run model inference on an input image: |
|
|
|
```python |
|
import torch |
|
from PIL import Image |
|
from llava.model.builder import load_pretrained_model |
|
from llava.mm_utils import process_images, tokenizer_image_token |
|
from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN |
|
from llava.conversation import conv_templates |
|
|
|
|
|
# - Load model |
|
tokenizer, model, image_processor, max_length = load_pretrained_model( |
|
model_name_or_path="inaf-oact-ai/radiollava-7b-qa", |
|
model_base=None, |
|
model_name="llava_qwen", |
|
device_map="auto" |
|
) |
|
|
|
# - Load image |
|
image_path= ... |
|
image= Image.fromarray(data).convert("RGB") |
|
|
|
# - Process image |
|
image_tensor = process_images([image], image_processor, model.config) |
|
image_tensor = [_image.to(dtype=torch.float16, device=model.device) for _image in image_tensor] |
|
|
|
# - Create prompt |
|
query= "Describe the input image" # Replace it with your query |
|
question = DEFAULT_IMAGE_TOKEN + "\n" + query |
|
conv = copy.deepcopy(conv_templates[conv_template]) |
|
conv.system= '<|im_start|>system\nYou are an AI assistant specialized in radio astronomical topics.' |
|
conv.append_message(conv.roles[0], question) |
|
conv.append_message(conv.roles[1], None) |
|
prompt_question = conv.get_prompt() |
|
|
|
# - Create model inputs |
|
input_ids = tokenizer_image_token( |
|
prompt_question, |
|
tokenizer, |
|
IMAGE_TOKEN_INDEX, |
|
return_tensors="pt" |
|
).unsqueeze(0).to(model.device) |
|
image_sizes = [image.size] |
|
|
|
# - Generate model response |
|
# Change generation parameters as you wish |
|
do_sample=True |
|
temperature= 0.3 |
|
max_new_tokens=4096 |
|
|
|
output = model.generate( |
|
input_ids, |
|
images=image_tensor, |
|
image_sizes=image_sizes, |
|
do_sample=do_sample, |
|
temperature=temperature if do_sample else None, |
|
max_new_tokens=max_new_tokens, |
|
) |
|
output_parsed= tokenizer.decode( |
|
output[0], |
|
skip_special_tokens=True, |
|
clean_up_tokenization_spaces=False |
|
) |
|
|
|
# - Process response as you wish ... |
|
#response= output_parsed.strip("\n").strip() |
|
``` |
|
|
|
See the tutorials available in the LLaVA-NeXT repository: |
|
|
|
`https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision_Tutorials.ipynb` |
|
|
|
Further usage examples are provided in this repository: |
|
|
|
`https://github.com/SKA-INAF/radio-llava.git` |
|
|