Update README.md
Browse files
README.md
CHANGED
|
@@ -21,15 +21,11 @@ pipeline_tag: image-text-to-text
|
|
| 21 |
|
| 22 |
We train and release "Cerule", a tiny yet powerful Vision Lanuage Model based on the newly released Google's [Gemma-2b](https://huggingface.co/google/gemma-2b) and Google's [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384).
|
| 23 |
|
| 24 |
-
We utilise highly efficient data selection techniques with:
|
| 25 |
```
|
| 26 |
- Pretraining stage : 650K images (A LAION Subset)
|
| 27 |
-
- Finetuning stage : 695K images (SVIT-mix-665K modified
|
| 28 |
```
|
| 29 |
-
The training setup was `4xA100's 80GB` and took ~6 hours to pretrain and ~13 hours to finetune. We modify and adapt the training code from [
|
| 30 |
-
|
| 31 |
-
🚨 Training code, Data and more details to release soon!
|
| 32 |
-
|
| 33 |
|
| 34 |
---
|
| 35 |
| Image | Example |
|
|
|
|
| 21 |
|
| 22 |
We train and release "Cerule", a tiny yet powerful Vision Lanuage Model based on the newly released Google's [Gemma-2b](https://huggingface.co/google/gemma-2b) and Google's [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384).
|
| 23 |
|
|
|
|
| 24 |
```
|
| 25 |
- Pretraining stage : 650K images (A LAION Subset)
|
| 26 |
+
- Finetuning stage : 695K images (SVIT-mix-665K - Bunny mix modified by BAAI)
|
| 27 |
```
|
| 28 |
+
The training setup was `4xA100's 80GB` and took ~6 hours to pretrain and ~13 hours to finetune. We modify and adapt the training code from [Bunny](https://github.com/BAAI-DCAI/Bunny).
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
---
|
| 31 |
| Image | Example |
|