inaf-oact-ai
/

radiollava-7b-qa

Image-Text-to-Text

Model card Files Files and versions

radiollava-7b-qa / README.md

sriggi's picture

Update README.md

b846b3f verified 6 months ago

|

history blame contribute delete

3.51 kB

	---
	license: gpl-3.0
	language:
	- en
	base_model:
	- lmms-lab/llava-onevision-qwen2-7b-ov
	pipeline_tag: image-text-to-text
	tags:
	- radioastronomy
	---
	# radiollava-7b-qa

	https://arxiv.org/abs/2503.23859

	radiollava is a domain-specialized vision-language AI assistant tailored for research in radioastronomy, in particular for running
	radio source analysis tasks on radio-continuum images. It was trained on ~1.5M user-assistant conversations relative to ~55k radio
	images taken from various radio surveys, including ASKAP-EMU, MeerKAT SMGPS and VLA FIRST.

	## Model Details

	- Base Architecture: llava-onevision
	- Base Model: llava-onevision-qwen2-7b-ov
	- Parameters: 7 billion
	- Domain: Radio Astronomy
	- License: GPL 3.0 License
	- Development Process: Supervised Fine-tuning (SFT) on QA pairs

	## Using the model
	To use this model, you need to install LLaVA-NeXT as described in this repository:

	`https://github.com/LLaVA-VL/LLaVA-NeXT`

	LLaVA-NeXT requires an outdated version of the `transformers` library (v4.40.0).

	To load the model:

	```python
	from llava.model.builder import load_pretrained_model

	tokenizer, model, image_processor, max_length = load_pretrained_model(
	model_name_or_path="inaf-oact-ai/radiollava-7b-qa",
	model_base=None,
	model_name="llava_qwen",
	device_map="auto"
	)
	```

	To run model inference on an input image:

	```python
	import torch
	from PIL import Image
	from llava.model.builder import load_pretrained_model
	from llava.mm_utils import process_images, tokenizer_image_token
	from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN
	from llava.conversation import conv_templates


	# - Load model
	tokenizer, model, image_processor, max_length = load_pretrained_model(
	model_name_or_path="inaf-oact-ai/radiollava-7b-qa",
	model_base=None,
	model_name="llava_qwen",
	device_map="auto"
	)

	# - Load image
	image_path= ...
	image= Image.fromarray(data).convert("RGB")

	# - Process image
	image_tensor = process_images([image], image_processor, model.config)
	image_tensor = [_image.to(dtype=torch.float16, device=model.device) for _image in image_tensor]

	# - Create prompt
	query= "Describe the input image" # Replace it with your query
	question = DEFAULT_IMAGE_TOKEN + "\n" + query
	conv = copy.deepcopy(conv_templates[conv_template])
	conv.system= '<\|im_start\|>system\nYou are an AI assistant specialized in radio astronomical topics.'
	conv.append_message(conv.roles[0], question)
	conv.append_message(conv.roles[1], None)
	prompt_question = conv.get_prompt()

	# - Create model inputs
	input_ids = tokenizer_image_token(
	prompt_question,
	tokenizer,
	IMAGE_TOKEN_INDEX,
	return_tensors="pt"
	).unsqueeze(0).to(model.device)
	image_sizes = [image.size]

	# - Generate model response
	# Change generation parameters as you wish
	do_sample=True
	temperature= 0.3
	max_new_tokens=4096

	output = model.generate(
	input_ids,
	images=image_tensor,
	image_sizes=image_sizes,
	do_sample=do_sample,
	temperature=temperature if do_sample else None,
	max_new_tokens=max_new_tokens,
	)
	output_parsed= tokenizer.decode(
	output[0],
	skip_special_tokens=True,
	clean_up_tokenization_spaces=False
	)

	# - Process response as you wish ...
	#response= output_parsed.strip("\n").strip()
	```

	See the tutorials available in the LLaVA-NeXT repository:

	`https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision_Tutorials.ipynb`

	Further usage examples are provided in this repository:

	`https://github.com/SKA-INAF/radio-llava.git`