Mark-CHAE
/

ViTCM-LLM

@@ -1,213 +1,198 @@
----
-language:
-- en
-- ko
-- zh
-license: apache-2.0
-library_name: peft
-pipeline_tag: visual-question-answering
-tags:
-- vision
-- visual-question-answering
-- multimodal
-- qwen
-- lora
-- tcm
-- traditional-chinese-medicine
-- tongue-diagnosis
----
-# ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model
-This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks.
-## Model Details
-### Model Description
-- **Developed by:** Mark-CHAE
-- **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct
-- **Language(s) (NLP):** Chinese, Korean, English
-- **License:** Apache-2.0
-- **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct
-- **Specialization:** Traditional Chinese Medicine Tongue Diagnosis
-### Model Sources
-- **Repository:** [Mark-CHAE/shezhen](https://huggingface.co/Mark-CHAE/shezhen)
-- **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)
-## Uses
-### Direct Use
-This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:
-- Traditional Chinese Medicine tongue diagnosis
-- Tongue image analysis and interpretation
-- Visual question answering for medical images
-- Multimodal medical conversations
-- Symptom analysis from tongue images
-### Downstream Use
-The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.
-### Out-of-Scope Use
-This model should not be used for:
-- Generating harmful, offensive, or inappropriate content
-- Creating deepfakes or misleading visual content
-- Any illegal activities
-- Making actual medical diagnoses without proper medical supervision
-### Recommendations
-Users should:
-- Verify outputs for accuracy and appropriateness
-- Be aware of potential biases in the model
-- Use appropriate safety measures when deploying
-- Not rely solely on this model for medical diagnosis
-- Consult qualified medical professionals for actual diagnosis
-## How to Get Started with the Model
-### Using the Inference Widget
-You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it.
-### Using the Model in Code
-```python
-from peft import PeftModel
-from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
-import torch
-from PIL import Image
-# Load base model and tokenizer
-base_model = AutoModelForCausalLM.from_pretrained(
-    "Qwen/Qwen2.5-VL-32B-Instruct",
-    torch_dtype=torch.float16,
-    device_map="auto"
-)
-tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
-processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
-# Load LoRA adapter
-model = PeftModel.from_pretrained(base_model, "Mark-CHAE/shezhen")
-# Prepare inputs
-image = Image.open("tongue_image.jpg")
-question = "根据图片判断舌诊内容"
-prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"
-inputs = processor(
-    text=prompt,
-    images=image,
-    return_tensors="pt"
-)
-# Generate response
-with torch.no_grad():
-    outputs = model.generate(
-        **inputs,
-        max_length=512,
-        temperature=0.7,
-        top_p=0.9,
-        do_sample=True,
-        pad_token_id=tokenizer.eos_token_id
-    )
-response = tokenizer.decode(outputs[0], skip_special_tokens=True)
-answer = response.split("<|im_start|>assistant")[-1].strip()
-print(answer)
-```
-## Training Details
-### Training Data
-The model was fine-tuned on multimodal vision-language data including Chinese, Korean, and English content, with specific focus on Traditional Chinese Medicine tongue diagnosis scenarios.
-### Training Procedure
-#### Training Hyperparameters
-- **Training regime:** LoRA fine-tuning
-- **LoRA rank:** 64
-- **LoRA alpha:** 128
-- **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj
-- **Training steps:** 2700
-- **Epochs:** ~8.9
-#### Speeds, Sizes, Times
-- **Adapter size:** 2.2GB
-- **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters)
-## Evaluation
-### Testing Data, Factors & Metrics
-#### Testing Data
-Evaluation was performed on multimodal vision-language benchmarks with focus on medical image understanding and TCM tongue diagnosis.
-#### Metrics
-Standard vision-language evaluation metrics including accuracy, BLEU, and human evaluation scores.
-### Results
-[Evaluation results to be added]
-#### Summary
-This LoRA adapter provides an efficient way to adapt the Qwen2.5-VL-32B-Instruct model for Traditional Chinese Medicine tongue diagnosis tasks while maintaining the base model's capabilities.
-## Technical Specifications
-### Model Architecture and Objective
-- **Architecture:** LoRA adapter for Qwen2.5-VL-32B-Instruct
-- **Objective:** Multimodal vision-language understanding and generation, specialized for TCM tongue diagnosis
-### Compute Infrastructure
-#### Hardware
-[To be specified]
-#### Software
-- PEFT 0.15.2
-- Transformers library
-- PyTorch
-## Citation
-**BibTeX:**
-```bibtex
-@misc{vitcm-llm,
-  author = {Mark-CHAE},
-  title = {ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model},
-  year = {2024},
-  url = {https://huggingface.co/Mark-CHAE/shezhen}
-}
-```
-**APA:**
-Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen
-## Model Card Contact
-For questions about this model, please contact the model author.
-### Framework versions
 - PEFT 0.15.2

+---
+language:
+- en
+- ko
+- zh
+license: apache-2.0
+library_name: peft
+pipeline_tag: visual-question-answering
+tags:
+- vision
+- visual-question-answering
+- multimodal
+- qwen
+- lora
+- tcm
+- traditional-chinese-medicine
+- tongue-diagnosis
+---
+# ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model
+This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks.
+## Model Details
+### Model Description
+- **Developed by:** Mark-CHAE
+- **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct
+- **Language(s) (NLP):** Chinese, Korean, English
+- **License:** Apache-2.0
+- **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct
+- **Specialization:** Traditional Chinese Medicine Tongue Diagnosis
+### Model Sources
+- **Repository:** [Mark-CHAE/shezhen](https://huggingface.co/Mark-CHAE/shezhen)
+- **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)
+## Uses
+### Direct Use
+This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:
+- Traditional Chinese Medicine tongue diagnosis
+- Tongue image analysis and interpretation
+- Visual question answering for medical images
+- Multimodal medical conversations
+- Symptom analysis from tongue images
+### Downstream Use
+The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.
+### Out-of-Scope Use
+This model should not be used for:
+- Generating harmful, offensive, or inappropriate content
+- Creating deepfakes or misleading visual content
+- Any illegal activities
+- Making actual medical diagnoses without proper medical supervision
+### Recommendations
+Users should:
+- Verify outputs for accuracy and appropriateness
+- Be aware of potential biases in the model
+- Use appropriate safety measures when deploying
+- Not rely solely on this model for medical diagnosis
+- Consult qualified medical professionals for actual diagnosis
+## How to Get Started with the Model
+### Using the Inference Widget
+You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it.
+### Using the Model in Code
+```python
+from peft import PeftModel
+from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
+import torch
+from PIL import Image
+# Load base model and tokenizer
+base_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen2.5-VL-32B-Instruct",
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
+processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
+# Load LoRA adapter
+model = PeftModel.from_pretrained(base_model, "Mark-CHAE/shezhen")
+# Prepare inputs
+image = Image.open("tongue_image.jpg")
+question = "根据图片判断舌诊内容"
+prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"
+inputs = processor(
+    text=prompt,
+    images=image,
+    return_tensors="pt"
+)
+# Generate response
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_length=512,
+        temperature=0.7,
+        top_p=0.9,
+        do_sample=True,
+        pad_token_id=tokenizer.eos_token_id
+    )
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+answer = response.split("<|im_start|>assistant")[-1].strip()
+print(answer)
+```
+## Training Details
+### Training Data
+The model was fine-tuned on multimodal vision-language data including Chinese, Korean, and English content, with specific focus on Traditional Chinese Medicine tongue diagnosis scenarios.
+### Training Procedure
+#### Training Hyperparameters
+- **Training regime:** LoRA fine-tuning
+- **LoRA rank:** 64
+- **LoRA alpha:** 128
+- **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj
+#### Speeds, Sizes, Times
+- **Adapter size:** 2.2GB
+- **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters)
+## Evaluation
+### Testing Data, Factors & Metrics
+#### Testing Data
+Evaluation was performed on multimodal vision-language benchmarks with focus on medical image understanding and TCM tongue diagnosis.
+#### Metrics
+Standard vision-language evaluation metrics including accuracy, BLEU, and human evaluation scores.
+### Results
+[Evaluation results to be added]
+#### Summary
+This LoRA adapter provides an efficient way to adapt the Qwen2.5-VL-32B-Instruct model for Traditional Chinese Medicine tongue diagnosis tasks while maintaining the base model's capabilities.
+## Technical Specifications
+### Model Architecture and Objective
+- **Architecture:** LoRA adapter for Qwen2.5-VL-32B-Instruct
+- **Objective:** Multimodal vision-language understanding and generation, specialized for TCM tongue diagnosis
+### Compute Infrastructure
+#### Software
+- PEFT 0.15.2
+- Transformers library
+- PyTorch
+**APA:**
+Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen
+## Model Card Contact
+For questions about this model, please contact the model author.
+### Framework versions
 - PEFT 0.15.2