ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model

This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks.

Model Details

Model Description

  • Developed by: Mark-CHAE
  • Model type: LoRA Adapter for Qwen2.5-VL-32B-Instruct
  • Language(s) (NLP): Chinese
  • License: Apache-2.0
  • Finetuned from model: Qwen/Qwen2.5-VL-32B-Instruct
  • Specialization: Traditional Chinese Medicine Tongue Diagnosis

Model Sources

Uses

Direct Use

This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:

  • Traditional Chinese Medicine tongue diagnosis
  • Tongue image analysis and interpretation
  • Visual question answering for medical images
  • Multimodal medical conversations
  • Symptom analysis from tongue images

Downstream Use

The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.

How to Get Started with the Model

Using the Inference Widget

You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it.

Using the Model in Code

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
import torch
from PIL import Image

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-VL-32B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mark-CHAE/ViTCM-LLM")

# Prepare inputs
image = Image.open("tongue_image.jpg")
question = "根据图片判断舌诊内容"

prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"

inputs = processor(
    text=prompt,
    images=image,
    return_tensors="pt"
)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=512,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
answer = response.split("<|im_start|>assistant")[-1].strip()
print(answer)

Training Procedure

Training Hyperparameters

  • Training regime: LoRA fine-tuning
  • LoRA rank: 64
  • LoRA alpha: 128
  • Target modules: v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj

Speeds, Sizes, Times

  • Adapter size: 2.2GB
  • Base model: Qwen2.5-VL-32B-Instruct (32B parameters)

Software

  • PEFT 0.15.2
  • Transformers library
  • PyTorch

APA:

Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen

Model Card Contact

For questions about this model, please contact the model author.

Framework versions

  • PEFT 0.15.2
Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support