Pixuai/tagscoring
Viewer • Updated • 12.6k • 15
The Llama-3.2-11B-Vision-Instruct-TagRater is a merged multi-modal model designed to rate images based on a provided tagword. By combining visual and language understanding, this model evaluates an image against a rating prompt and produces a concise explanation along with a relevance rating from 0 to 5.
unsloth/Llama-3.2-11B-Vision-Instructr=16, lora_alpha=16).During training, the following text instruction was used to guide the model in rating images based on the provided tagword:
Evaluate how well this image matches the search term: [tagword] . Provide a concise reason and assign a 0–5 relevance score:
Scoring (0–5):
0 – Not Relevant: No connection.
1 – Barely Relevant: Very weak or vague link.
2 – Minimally Relevant: Hints but lacks clarity.
3 – Moderately Relevant: Noticeable link, not the main focus.
4 – Highly Relevant: Strong, clear representation.
5 – Perfectly Relevant: Ideal example.
Content Relevance: Does it clearly relate?
Context & Setting: Does its overall style fit the theme?
Visual Appeal & User Satisfaction: Would users find this image useful or satisfying based on the search term?
During inference, the model has demonstrated the following performance:
These metrics provide insights into the model’s computational efficiency during operation.
from unsloth import FastVisionModel
import base64
import json
import re
from io import BytesIO
from PIL import Image
import torch
# --- Model Initialization ---
model, tokenizer = FastVisionModel.from_pretrained(
model_name="Pixuai/Llama-3.2-11B-Vision-Instruct-TagRater",
load_in_4bit=True,
max_seq_length=150,
)
FastVisionModel.for_inference(model)
# --- Input Preparation ---
# Replace these variables with your actual prompt and base64 encoded image string.
prompt = "Evaluate how well this image matches the search term: space explorer . Provide a concise reason and assign a 0–5 relevance score: Scoring (0–5): 0 – Not Relevant: No connection. 1 – Barely Relevant: Very weak or vague link. 2 – Minimally Relevant: Hints but lacks clarity. 3 – Moderately Relevant: Noticeable link, not the main focus. 4 – Highly Relevant: Strong, clear representation. 5 – Perfectly Relevant: Ideal example. - Content Relevance: Does it clearly relate? - Context & Setting: Does its overall style fit the theme? - Visual Appeal & User Satisfaction: Would users find this image useful or satisfying based on the search term?"
image_b64 = "base64_string_here" # Replace with a valid base64 encoded image string
try:
image_data = base64.b64decode(image_b64)
image = Image.open(BytesIO(image_data)).convert("RGB")
except Exception as e:
print(f"Error decoding image: {str(e)}")
exit()
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": prompt}
],
}
]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(
image,
input_text,
add_special_tokens=False,
return_tensors="pt",
).to("cuda")
# --- Inference ---
gen_tokens = model.generate(
**inputs,
max_new_tokens=100,
use_cache=True,
temperature=0.1,
min_p=0.1,
)
# --- Decoding Output ---
output_text = tokenizer.decode(gen_tokens[0], skip_special_tokens=True)
json_match = re.search(r'({.*})', output_text, re.DOTALL)
if json_match:
json_str = json_match.group(1)
try:
json_obj = json.loads(json_str)
except json.JSONDecodeError:
print("Invalid JSON output.")
exit()
print("JSON Output:", json_obj)
else:
print("No JSON object found in the output.")
MIT
Base model
meta-llama/Llama-3.2-11B-Vision-Instruct