Mark-CHAE commited on
Commit
423694a
·
verified ·
1 Parent(s): 8374b0f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +197 -212
README.md CHANGED
@@ -1,213 +1,198 @@
1
- ---
2
- language:
3
- - en
4
- - ko
5
- - zh
6
- license: apache-2.0
7
- library_name: peft
8
- pipeline_tag: visual-question-answering
9
- tags:
10
- - vision
11
- - visual-question-answering
12
- - multimodal
13
- - qwen
14
- - lora
15
- - tcm
16
- - traditional-chinese-medicine
17
- - tongue-diagnosis
18
- ---
19
-
20
- # ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model
21
-
22
- This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks.
23
-
24
- ## Model Details
25
-
26
- ### Model Description
27
-
28
- - **Developed by:** Mark-CHAE
29
- - **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct
30
- - **Language(s) (NLP):** Chinese, Korean, English
31
- - **License:** Apache-2.0
32
- - **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct
33
- - **Specialization:** Traditional Chinese Medicine Tongue Diagnosis
34
-
35
- ### Model Sources
36
-
37
- - **Repository:** [Mark-CHAE/shezhen](https://huggingface.co/Mark-CHAE/shezhen)
38
- - **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)
39
-
40
- ## Uses
41
-
42
- ### Direct Use
43
-
44
- This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:
45
-
46
- - Traditional Chinese Medicine tongue diagnosis
47
- - Tongue image analysis and interpretation
48
- - Visual question answering for medical images
49
- - Multimodal medical conversations
50
- - Symptom analysis from tongue images
51
-
52
- ### Downstream Use
53
-
54
- The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.
55
-
56
- ### Out-of-Scope Use
57
-
58
- This model should not be used for:
59
-
60
- - Generating harmful, offensive, or inappropriate content
61
- - Creating deepfakes or misleading visual content
62
- - Any illegal activities
63
- - Making actual medical diagnoses without proper medical supervision
64
-
65
- ### Recommendations
66
-
67
- Users should:
68
-
69
- - Verify outputs for accuracy and appropriateness
70
- - Be aware of potential biases in the model
71
- - Use appropriate safety measures when deploying
72
- - Not rely solely on this model for medical diagnosis
73
- - Consult qualified medical professionals for actual diagnosis
74
-
75
- ## How to Get Started with the Model
76
-
77
- ### Using the Inference Widget
78
-
79
- You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it.
80
-
81
- ### Using the Model in Code
82
-
83
- ```python
84
- from peft import PeftModel
85
- from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
86
- import torch
87
- from PIL import Image
88
-
89
- # Load base model and tokenizer
90
- base_model = AutoModelForCausalLM.from_pretrained(
91
- "Qwen/Qwen2.5-VL-32B-Instruct",
92
- torch_dtype=torch.float16,
93
- device_map="auto"
94
- )
95
- tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
96
- processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
97
-
98
- # Load LoRA adapter
99
- model = PeftModel.from_pretrained(base_model, "Mark-CHAE/shezhen")
100
-
101
- # Prepare inputs
102
- image = Image.open("tongue_image.jpg")
103
- question = "根据图片判断舌诊内容"
104
-
105
- prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"
106
-
107
- inputs = processor(
108
- text=prompt,
109
- images=image,
110
- return_tensors="pt"
111
- )
112
-
113
- # Generate response
114
- with torch.no_grad():
115
- outputs = model.generate(
116
- **inputs,
117
- max_length=512,
118
- temperature=0.7,
119
- top_p=0.9,
120
- do_sample=True,
121
- pad_token_id=tokenizer.eos_token_id
122
- )
123
-
124
- response = tokenizer.decode(outputs[0], skip_special_tokens=True)
125
- answer = response.split("<|im_start|>assistant")[-1].strip()
126
- print(answer)
127
- ```
128
-
129
- ## Training Details
130
-
131
- ### Training Data
132
-
133
- The model was fine-tuned on multimodal vision-language data including Chinese, Korean, and English content, with specific focus on Traditional Chinese Medicine tongue diagnosis scenarios.
134
-
135
- ### Training Procedure
136
-
137
- #### Training Hyperparameters
138
-
139
- - **Training regime:** LoRA fine-tuning
140
- - **LoRA rank:** 64
141
- - **LoRA alpha:** 128
142
- - **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj
143
- - **Training steps:** 2700
144
- - **Epochs:** ~8.9
145
-
146
- #### Speeds, Sizes, Times
147
-
148
- - **Adapter size:** 2.2GB
149
- - **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters)
150
-
151
- ## Evaluation
152
-
153
- ### Testing Data, Factors & Metrics
154
-
155
- #### Testing Data
156
-
157
- Evaluation was performed on multimodal vision-language benchmarks with focus on medical image understanding and TCM tongue diagnosis.
158
-
159
- #### Metrics
160
-
161
- Standard vision-language evaluation metrics including accuracy, BLEU, and human evaluation scores.
162
-
163
- ### Results
164
-
165
- [Evaluation results to be added]
166
-
167
- #### Summary
168
-
169
- This LoRA adapter provides an efficient way to adapt the Qwen2.5-VL-32B-Instruct model for Traditional Chinese Medicine tongue diagnosis tasks while maintaining the base model's capabilities.
170
-
171
- ## Technical Specifications
172
-
173
- ### Model Architecture and Objective
174
-
175
- - **Architecture:** LoRA adapter for Qwen2.5-VL-32B-Instruct
176
- - **Objective:** Multimodal vision-language understanding and generation, specialized for TCM tongue diagnosis
177
-
178
- ### Compute Infrastructure
179
-
180
- #### Hardware
181
-
182
- [To be specified]
183
-
184
- #### Software
185
-
186
- - PEFT 0.15.2
187
- - Transformers library
188
- - PyTorch
189
-
190
- ## Citation
191
-
192
- **BibTeX:**
193
-
194
- ```bibtex
195
- @misc{vitcm-llm,
196
- author = {Mark-CHAE},
197
- title = {ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model},
198
- year = {2024},
199
- url = {https://huggingface.co/Mark-CHAE/shezhen}
200
- }
201
- ```
202
-
203
- **APA:**
204
-
205
- Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen
206
-
207
- ## Model Card Contact
208
-
209
- For questions about this model, please contact the model author.
210
-
211
- ### Framework versions
212
-
213
  - PEFT 0.15.2
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ko
5
+ - zh
6
+ license: apache-2.0
7
+ library_name: peft
8
+ pipeline_tag: visual-question-answering
9
+ tags:
10
+ - vision
11
+ - visual-question-answering
12
+ - multimodal
13
+ - qwen
14
+ - lora
15
+ - tcm
16
+ - traditional-chinese-medicine
17
+ - tongue-diagnosis
18
+ ---
19
+
20
+ # ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model
21
+
22
+ This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks.
23
+
24
+ ## Model Details
25
+
26
+ ### Model Description
27
+
28
+ - **Developed by:** Mark-CHAE
29
+ - **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct
30
+ - **Language(s) (NLP):** Chinese, Korean, English
31
+ - **License:** Apache-2.0
32
+ - **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct
33
+ - **Specialization:** Traditional Chinese Medicine Tongue Diagnosis
34
+
35
+ ### Model Sources
36
+
37
+ - **Repository:** [Mark-CHAE/shezhen](https://huggingface.co/Mark-CHAE/shezhen)
38
+ - **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)
39
+
40
+ ## Uses
41
+
42
+ ### Direct Use
43
+
44
+ This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:
45
+
46
+ - Traditional Chinese Medicine tongue diagnosis
47
+ - Tongue image analysis and interpretation
48
+ - Visual question answering for medical images
49
+ - Multimodal medical conversations
50
+ - Symptom analysis from tongue images
51
+
52
+ ### Downstream Use
53
+
54
+ The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.
55
+
56
+ ### Out-of-Scope Use
57
+
58
+ This model should not be used for:
59
+
60
+ - Generating harmful, offensive, or inappropriate content
61
+ - Creating deepfakes or misleading visual content
62
+ - Any illegal activities
63
+ - Making actual medical diagnoses without proper medical supervision
64
+
65
+ ### Recommendations
66
+
67
+ Users should:
68
+
69
+ - Verify outputs for accuracy and appropriateness
70
+ - Be aware of potential biases in the model
71
+ - Use appropriate safety measures when deploying
72
+ - Not rely solely on this model for medical diagnosis
73
+ - Consult qualified medical professionals for actual diagnosis
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ ### Using the Inference Widget
78
+
79
+ You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it.
80
+
81
+ ### Using the Model in Code
82
+
83
+ ```python
84
+ from peft import PeftModel
85
+ from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
86
+ import torch
87
+ from PIL import Image
88
+
89
+ # Load base model and tokenizer
90
+ base_model = AutoModelForCausalLM.from_pretrained(
91
+ "Qwen/Qwen2.5-VL-32B-Instruct",
92
+ torch_dtype=torch.float16,
93
+ device_map="auto"
94
+ )
95
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
96
+ processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
97
+
98
+ # Load LoRA adapter
99
+ model = PeftModel.from_pretrained(base_model, "Mark-CHAE/shezhen")
100
+
101
+ # Prepare inputs
102
+ image = Image.open("tongue_image.jpg")
103
+ question = "根据图片判断舌诊内容"
104
+
105
+ prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"
106
+
107
+ inputs = processor(
108
+ text=prompt,
109
+ images=image,
110
+ return_tensors="pt"
111
+ )
112
+
113
+ # Generate response
114
+ with torch.no_grad():
115
+ outputs = model.generate(
116
+ **inputs,
117
+ max_length=512,
118
+ temperature=0.7,
119
+ top_p=0.9,
120
+ do_sample=True,
121
+ pad_token_id=tokenizer.eos_token_id
122
+ )
123
+
124
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
125
+ answer = response.split("<|im_start|>assistant")[-1].strip()
126
+ print(answer)
127
+ ```
128
+
129
+ ## Training Details
130
+
131
+ ### Training Data
132
+
133
+ The model was fine-tuned on multimodal vision-language data including Chinese, Korean, and English content, with specific focus on Traditional Chinese Medicine tongue diagnosis scenarios.
134
+
135
+ ### Training Procedure
136
+
137
+ #### Training Hyperparameters
138
+
139
+ - **Training regime:** LoRA fine-tuning
140
+ - **LoRA rank:** 64
141
+ - **LoRA alpha:** 128
142
+ - **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj
143
+
144
+
145
+ #### Speeds, Sizes, Times
146
+
147
+ - **Adapter size:** 2.2GB
148
+ - **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters)
149
+
150
+ ## Evaluation
151
+
152
+ ### Testing Data, Factors & Metrics
153
+
154
+ #### Testing Data
155
+
156
+ Evaluation was performed on multimodal vision-language benchmarks with focus on medical image understanding and TCM tongue diagnosis.
157
+
158
+ #### Metrics
159
+
160
+ Standard vision-language evaluation metrics including accuracy, BLEU, and human evaluation scores.
161
+
162
+ ### Results
163
+
164
+ [Evaluation results to be added]
165
+
166
+ #### Summary
167
+
168
+ This LoRA adapter provides an efficient way to adapt the Qwen2.5-VL-32B-Instruct model for Traditional Chinese Medicine tongue diagnosis tasks while maintaining the base model's capabilities.
169
+
170
+ ## Technical Specifications
171
+
172
+ ### Model Architecture and Objective
173
+
174
+ - **Architecture:** LoRA adapter for Qwen2.5-VL-32B-Instruct
175
+ - **Objective:** Multimodal vision-language understanding and generation, specialized for TCM tongue diagnosis
176
+
177
+ ### Compute Infrastructure
178
+
179
+
180
+ #### Software
181
+
182
+ - PEFT 0.15.2
183
+ - Transformers library
184
+ - PyTorch
185
+
186
+
187
+
188
+ **APA:**
189
+
190
+ Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen
191
+
192
+ ## Model Card Contact
193
+
194
+ For questions about this model, please contact the model author.
195
+
196
+ ### Framework versions
197
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
198
  - PEFT 0.15.2