keeeeenw commited on
Commit
f598e6a
·
verified ·
1 Parent(s): b2e8b6a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -6
README.md CHANGED
@@ -14,7 +14,7 @@ base_model:
14
  - google/siglip-so400m-patch14-384
15
  ---
16
 
17
- # MicroLLaVA
18
 
19
  A compact vision language model that you can pretrain and finetune on a single consumer GPU such as NVIDIA RTX 4090 with 24G of VRAM.
20
 
@@ -69,17 +69,24 @@ print('model output:', output_text)
69
  print('runing time:', genertaion_time)
70
  ```
71
 
72
- Example Image from Llava
73
 
 
74
  ![Llava Input Image Example](https://llava-vl.github.io/static/images/view.jpg "Llava Input Image Example")
75
 
76
- Example output
77
 
78
- model output: When I visit the beach at the waterfront, I should be cautious about several things. First, I should be cautious about the water, as it is a popular spot for boating and fishing. The water is shallow and shallow, making it difficult for boats to navigate and navigate. Additionally, the water is not a suitable surface for boating, as it is too shallow for boating. Additionally, the water is not suitable for swimming or fishing, as it is too cold and wet. Lastly, I should be cautious about the presence of other boats, such as boats that are parked on the beach, or boats that are not visible from the water. These factors can lead to potential accidents or accidents, as they can cause damage to the boat and the other boats in the water.
 
 
 
79
 
 
80
 
81
- Note: for inference, I created the special class modeling_tinyllava_llama.py which loads the same chat template as the TinyLlava model for TinyLlama and connect the llm to the vision tower.
82
- This class may require additional dependencies such as PyTorch and Transformer library.
 
 
83
 
84
  ---
85
 
 
14
  - google/siglip-so400m-patch14-384
15
  ---
16
 
17
+ # MicroLLaVA-siglip-so400m
18
 
19
  A compact vision language model that you can pretrain and finetune on a single consumer GPU such as NVIDIA RTX 4090 with 24G of VRAM.
20
 
 
69
  print('runing time:', genertaion_time)
70
  ```
71
 
72
+ ### Example Usage
73
 
74
+ **Input Image:**
75
  ![Llava Input Image Example](https://llava-vl.github.io/static/images/view.jpg "Llava Input Image Example")
76
 
77
+ **Prompt:** "What are the things I should be cautious about when I visit here?"
78
 
79
+ **Model Output:**
80
+ ```
81
+ When I visit the beach at the waterfront, I should be cautious about several things. First, I should be cautious about the water, as it is a popular spot for boating and fishing. The water is shallow and shallow, making it difficult for boats to navigate and navigate. Additionally, the water is not a suitable surface for boating, as it is too shallow for boating. Additionally, the water is not suitable for swimming or fishing, as it is too cold and wet. Lastly, I should be cautious about the presence of other boats, such as boats that are parked on the beach, or boats that are not visible from the water. These factors can lead to potential accidents or accidents, as they can cause damage to the boat and the other boats in the water.
82
+ ```
83
 
84
+ ### Implementation Notes
85
 
86
+ For inference, I created a custom class [modeling_tinyllava_llama.py](https://huggingface.co/keeeeenw/MicroLlava-siglip-so400m/blob/main/modeling_tinyllava_llama.py) which:
87
+ - Loads the same chat template as the TinyLlava model for TinyLlama
88
+ - Connects the LLM to the vision tower
89
+ - May require additional dependencies such as PyTorch and Transformers library
90
 
91
  ---
92