File size: 4,210 Bytes
423694a 94f4291 423694a c1a3980 423694a c1a3980 423694a 8374b0f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
---
language:
- en
- ko
- zh
license: apache-2.0
library_name: peft
pipeline_tag: visual-question-answering
tags:
- vision
- visual-question-answering
- multimodal
- qwen
- lora
- tcm
- traditional-chinese-medicine
- tongue-diagnosis
---
# ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model
This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks.
## Model Details
### Model Description
- **Developed by:** Mark-CHAE
- **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct
- **Language(s) (NLP):** Chinese
- **License:** Apache-2.0
- **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct
- **Specialization:** Traditional Chinese Medicine Tongue Diagnosis
### Model Sources
- **Repository:** [Mark-CHAE/shezhen](https://huggingface.co/Mark-CHAE/ViTCM-LLM)
- **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)
## Uses
### Direct Use
This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:
- Traditional Chinese Medicine tongue diagnosis
- Tongue image analysis and interpretation
- Visual question answering for medical images
- Multimodal medical conversations
- Symptom analysis from tongue images
### Downstream Use
The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.
### Out-of-Scope Use
This model should not be used for:
- Generating harmful, offensive, or inappropriate content
- Creating deepfakes or misleading visual content
- Any illegal activities
- Making actual medical diagnoses without proper medical supervision
### Recommendations
Users should:
- Verify outputs for accuracy and appropriateness
- Be aware of potential biases in the model
- Use appropriate safety measures when deploying
- Not rely solely on this model for medical diagnosis
- Consult qualified medical professionals for actual diagnosis
## How to Get Started with the Model
### Using the Inference Widget
You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it.
### Using the Model in Code
```python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
import torch
from PIL import Image
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-VL-32B-Instruct",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mark-CHAE/ViTCM-LLM")
# Prepare inputs
image = Image.open("tongue_image.jpg")
question = "根据图片判断舌诊内容"
prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"
inputs = processor(
text=prompt,
images=image,
return_tensors="pt"
)
# Generate response
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
answer = response.split("<|im_start|>assistant")[-1].strip()
print(answer)
```
### Training Procedure
#### Training Hyperparameters
- **Training regime:** LoRA fine-tuning
- **LoRA rank:** 64
- **LoRA alpha:** 128
- **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj
#### Speeds, Sizes, Times
- **Adapter size:** 2.2GB
- **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters)
#### Software
- PEFT 0.15.2
- Transformers library
- PyTorch
**APA:**
Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen
## Model Card Contact
For questions about this model, please contact the model author.
### Framework versions
- PEFT 0.15.2 |