keeeeenw
/

MicroLlava

@@ -14,7 +14,7 @@ base_model:
 - google/siglip-so400m-patch14-384
 ---
-# MicroLLaVA
 A compact vision language model that you can pretrain and finetune on a single consumer GPU such as NVIDIA RTX 4090 with 24G of VRAM.
@@ -69,17 +69,24 @@ print('model output:', output_text)
 print('runing time:', genertaion_time)
 ```
-Example Image from Llava
 ![Llava Input Image Example](https://llava-vl.github.io/static/images/view.jpg "Llava Input Image Example")
-Example output
-model output: When I visit the beach at the waterfront, I should be cautious about several things. First, I should be cautious about the water, as it is a popular spot for boating and fishing. The water is shallow and shallow, making it difficult for boats to navigate and navigate. Additionally, the water is not a suitable surface for boating, as it is too shallow for boating. Additionally, the water is not suitable for swimming or fishing, as it is too cold and wet. Lastly, I should be cautious about the presence of other boats, such as boats that are parked on the beach, or boats that are not visible from the water. These factors can lead to potential accidents or accidents, as they can cause damage to the boat and the other boats in the water.
-Note: for inference, I created the special class modeling_tinyllava_llama.py which loads the same chat template as the TinyLlava model for TinyLlama and connect the llm to the vision tower.
-This class may require additional dependencies such as PyTorch and Transformer library.
 ---

 - google/siglip-so400m-patch14-384
 ---
+# MicroLLaVA-siglip-so400m
 A compact vision language model that you can pretrain and finetune on a single consumer GPU such as NVIDIA RTX 4090 with 24G of VRAM.
 print('runing time:', genertaion_time)
 ```
+### Example Usage
+**Input Image:**
 ![Llava Input Image Example](https://llava-vl.github.io/static/images/view.jpg "Llava Input Image Example")
+**Prompt:** "What are the things I should be cautious about when I visit here?"
+**Model Output:**
+```
+When I visit the beach at the waterfront, I should be cautious about several things. First, I should be cautious about the water, as it is a popular spot for boating and fishing. The water is shallow and shallow, making it difficult for boats to navigate and navigate. Additionally, the water is not a suitable surface for boating, as it is too shallow for boating. Additionally, the water is not suitable for swimming or fishing, as it is too cold and wet. Lastly, I should be cautious about the presence of other boats, such as boats that are parked on the beach, or boats that are not visible from the water. These factors can lead to potential accidents or accidents, as they can cause damage to the boat and the other boats in the water.
+```
+### Implementation Notes
+For inference, I created a custom class [modeling_tinyllava_llama.py](https://huggingface.co/keeeeenw/MicroLlava-siglip-so400m/blob/main/modeling_tinyllava_llama.py) which:
+- Loads the same chat template as the TinyLlava model for TinyLlama
+- Connects the LLM to the vision tower
+- May require additional dependencies such as PyTorch and Transformers library
 ---