File size: 4,210 Bytes
423694a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94f4291
423694a
 
 
 
 
 
c1a3980
423694a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c1a3980
423694a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8374b0f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
language:
- en
- ko
- zh
license: apache-2.0
library_name: peft
pipeline_tag: visual-question-answering
tags:
- vision
- visual-question-answering
- multimodal
- qwen
- lora
- tcm
- traditional-chinese-medicine
- tongue-diagnosis
---

# ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model

This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks.

## Model Details

### Model Description

- **Developed by:** Mark-CHAE
- **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct
- **Language(s) (NLP):** Chinese
- **License:** Apache-2.0
- **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct
- **Specialization:** Traditional Chinese Medicine Tongue Diagnosis

### Model Sources

- **Repository:** [Mark-CHAE/shezhen](https://huggingface.co/Mark-CHAE/ViTCM-LLM)
- **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)

## Uses

### Direct Use

This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:

- Traditional Chinese Medicine tongue diagnosis
- Tongue image analysis and interpretation
- Visual question answering for medical images
- Multimodal medical conversations
- Symptom analysis from tongue images

### Downstream Use

The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.

### Out-of-Scope Use

This model should not be used for:

- Generating harmful, offensive, or inappropriate content
- Creating deepfakes or misleading visual content
- Any illegal activities
- Making actual medical diagnoses without proper medical supervision

### Recommendations

Users should:

- Verify outputs for accuracy and appropriateness
- Be aware of potential biases in the model
- Use appropriate safety measures when deploying
- Not rely solely on this model for medical diagnosis
- Consult qualified medical professionals for actual diagnosis

## How to Get Started with the Model

### Using the Inference Widget

You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it.

### Using the Model in Code

```python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
import torch
from PIL import Image

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-VL-32B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mark-CHAE/ViTCM-LLM")

# Prepare inputs
image = Image.open("tongue_image.jpg")
question = "根据图片判断舌诊内容"

prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"

inputs = processor(
    text=prompt,
    images=image,
    return_tensors="pt"
)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=512,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
answer = response.split("<|im_start|>assistant")[-1].strip()
print(answer)
```


### Training Procedure

#### Training Hyperparameters

- **Training regime:** LoRA fine-tuning
- **LoRA rank:** 64
- **LoRA alpha:** 128
- **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj


#### Speeds, Sizes, Times

- **Adapter size:** 2.2GB
- **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters)


#### Software

- PEFT 0.15.2
- Transformers library
- PyTorch



**APA:**

Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen

## Model Card Contact

For questions about this model, please contact the model author.

### Framework versions

- PEFT 0.15.2