ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model

This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks.

Model Details

Model Description

Developed by: Mark-CHAE
Model type: LoRA Adapter for Qwen2.5-VL-32B-Instruct
Language(s) (NLP): Chinese
License: Apache-2.0
Finetuned from model: Qwen/Qwen2.5-VL-32B-Instruct
Specialization: Traditional Chinese Medicine Tongue Diagnosis

Model Sources

Repository: Mark-CHAE/ ViTCM-LLM
Base Model: Qwen/Qwen2.5-VL-32B-Instruct

Uses

Direct Use

This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:

Traditional Chinese Medicine tongue diagnosis
Tongue image analysis and interpretation
Visual question answering for medical images
Multimodal medical conversations
Symptom analysis from tongue images

Downstream Use

The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.

How to Get Started with the Model

Using the Inference Widget

You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it.

Using the Model in Code

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
import torch
from PIL import Image

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-VL-32B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mark-CHAE/ViTCM-LLM")

# Prepare inputs
image = Image.open("tongue_image.jpg")
question = "根据图片判断舌诊内容"

prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"

inputs = processor(
    text=prompt,
    images=image,
    return_tensors="pt"
)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=512,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
answer = response.split("<|im_start|>assistant")[-1].strip()
print(answer)

Training Procedure

Training Hyperparameters

Training regime: LoRA fine-tuning
LoRA rank: 64
LoRA alpha: 128
Target modules: v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj

Speeds, Sizes, Times

Adapter size: 2.2GB
Base model: Qwen2.5-VL-32B-Instruct (32B parameters)

Software

PEFT 0.15.2
Transformers library
PyTorch

APA:

Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen

Model Card Contact

For questions about this model, please contact the model author.

Framework versions

PEFT 0.15.2