File size: 3,507 Bytes
92c214d 74f9a3d 34b7a7d 74f9a3d b846b3f 74f9a3d 63ff6f1 e6d2378 a064035 f333cb5 a064035 f333cb5 a064035 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
---
license: gpl-3.0
language:
- en
base_model:
- lmms-lab/llava-onevision-qwen2-7b-ov
pipeline_tag: image-text-to-text
tags:
- radioastronomy
---
# radiollava-7b-qa
https://arxiv.org/abs/2503.23859
radiollava is a domain-specialized vision-language AI assistant tailored for research in radioastronomy, in particular for running
radio source analysis tasks on radio-continuum images. It was trained on ~1.5M user-assistant conversations relative to ~55k radio
images taken from various radio surveys, including ASKAP-EMU, MeerKAT SMGPS and VLA FIRST.
## Model Details
- **Base Architecture**: llava-onevision
- **Base Model**: llava-onevision-qwen2-7b-ov
- **Parameters**: 7 billion
- **Domain**: Radio Astronomy
- **License**: GPL 3.0 License
- **Development Process**: Supervised Fine-tuning (SFT) on QA pairs
## Using the model
To use this model, you need to install LLaVA-NeXT as described in this repository:
`https://github.com/LLaVA-VL/LLaVA-NeXT`
LLaVA-NeXT requires an outdated version of the `transformers` library (v4.40.0).
To load the model:
```python
from llava.model.builder import load_pretrained_model
tokenizer, model, image_processor, max_length = load_pretrained_model(
model_name_or_path="inaf-oact-ai/radiollava-7b-qa",
model_base=None,
model_name="llava_qwen",
device_map="auto"
)
```
To run model inference on an input image:
```python
import torch
from PIL import Image
from llava.model.builder import load_pretrained_model
from llava.mm_utils import process_images, tokenizer_image_token
from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN
from llava.conversation import conv_templates
# - Load model
tokenizer, model, image_processor, max_length = load_pretrained_model(
model_name_or_path="inaf-oact-ai/radiollava-7b-qa",
model_base=None,
model_name="llava_qwen",
device_map="auto"
)
# - Load image
image_path= ...
image= Image.fromarray(data).convert("RGB")
# - Process image
image_tensor = process_images([image], image_processor, model.config)
image_tensor = [_image.to(dtype=torch.float16, device=model.device) for _image in image_tensor]
# - Create prompt
query= "Describe the input image" # Replace it with your query
question = DEFAULT_IMAGE_TOKEN + "\n" + query
conv = copy.deepcopy(conv_templates[conv_template])
conv.system= '<|im_start|>system\nYou are an AI assistant specialized in radio astronomical topics.'
conv.append_message(conv.roles[0], question)
conv.append_message(conv.roles[1], None)
prompt_question = conv.get_prompt()
# - Create model inputs
input_ids = tokenizer_image_token(
prompt_question,
tokenizer,
IMAGE_TOKEN_INDEX,
return_tensors="pt"
).unsqueeze(0).to(model.device)
image_sizes = [image.size]
# - Generate model response
# Change generation parameters as you wish
do_sample=True
temperature= 0.3
max_new_tokens=4096
output = model.generate(
input_ids,
images=image_tensor,
image_sizes=image_sizes,
do_sample=do_sample,
temperature=temperature if do_sample else None,
max_new_tokens=max_new_tokens,
)
output_parsed= tokenizer.decode(
output[0],
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
# - Process response as you wish ...
#response= output_parsed.strip("\n").strip()
```
See the tutorials available in the LLaVA-NeXT repository:
`https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision_Tutorials.ipynb`
Further usage examples are provided in this repository:
`https://github.com/SKA-INAF/radio-llava.git`
|