Update README.md
Browse files
README.md
CHANGED
|
@@ -19,6 +19,24 @@ datasets:
|
|
| 19 |
CrystalChat-7B based multi-modal large language model (MLLM) mimics the training recipe used for Vicuna-7B based [LLaVa-v1.5](https://huggingface.co/docs/transformers/main/model_doc/llava). CrystalChat-7B based MLLMs models are entirely transparent, having open-sourced all materials, including code, data, model checkpoint, intermediate results, and more at [Web2Code: A Large-scale Webpage-to-Code Dataset
|
| 20 |
and Evaluation Framework for Multimodal LLMs](https://arxiv.org/pdf/2406.20098). CrystalChat-7B-Web2Code MLLM is specialized in webpage images-to-html code generation.
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
## Web2Code Dataset
|
| 23 |
Our Web2Code instruction tuning dataset construction and instruction generation process
|
| 24 |
involves four key components:
|
|
@@ -142,29 +160,6 @@ The dataset chosen was created by LLaVA with academic-task-oriented VQA data mix
|
|
| 142 |
|
| 143 |
**Table 6:** Distribution of DWU and DWU<sub>R</sub> datasets. Both datasets include high-quality question-answer pairs for webpage understanding.*
|
| 144 |
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
## Examples
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
**Example 1: Hand drawn images**
|
| 152 |
-
|
| 153 |
-
|  |  |
|
| 154 |
-
|:----------------------:|:----------------------:|
|
| 155 |
-
| Hand Drawn Webpage | CrystalChat-Web2Code Rendering |
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
**Example 2: Recreate a webpage from an image**
|
| 159 |
-
Image 1: Original Webpage
|
| 160 |
-
<center><img src="images2/ori.png" alt="k2 eval table" /></center>
|
| 161 |
-
|
| 162 |
-
Image 2: CrystalChat-Web2Code Rendering
|
| 163 |
-
<center><img src="images2/crystalchat.png" alt="k2 eval table" /></center>
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
**Image 3:** Hand-drawn webpage input to CrystalChat-7B-Web2Code generated output.
|
| 167 |
-
|
| 168 |
## Loading Crystal
|
| 169 |
```python
|
| 170 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
| 19 |
CrystalChat-7B based multi-modal large language model (MLLM) mimics the training recipe used for Vicuna-7B based [LLaVa-v1.5](https://huggingface.co/docs/transformers/main/model_doc/llava). CrystalChat-7B based MLLMs models are entirely transparent, having open-sourced all materials, including code, data, model checkpoint, intermediate results, and more at [Web2Code: A Large-scale Webpage-to-Code Dataset
|
| 20 |
and Evaluation Framework for Multimodal LLMs](https://arxiv.org/pdf/2406.20098). CrystalChat-7B-Web2Code MLLM is specialized in webpage images-to-html code generation.
|
| 21 |
|
| 22 |
+
|
| 23 |
+
## CrystalChat-Web2Code Features
|
| 24 |
+
|
| 25 |
+
**Covert hand-drawn images to a website**
|
| 26 |
+
|
| 27 |
+
|  |  |
|
| 28 |
+
|:----------------------:|:----------------------:|
|
| 29 |
+
| Hand Drawn Webpage | CrystalChat-Web2Code Rendering |
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
**Recreate a new webpage from an existing webpage**
|
| 33 |
+
Image 1: Original Webpage
|
| 34 |
+
<center><img src="images2/ori.png" alt="k2 eval table" /></center>
|
| 35 |
+
|
| 36 |
+
Image 2: CrystalChat-Web2Code Rendering
|
| 37 |
+
<center><img src="images2/crystalchat.png" alt="k2 eval table" /></center>
|
| 38 |
+
|
| 39 |
+
|
| 40 |
## Web2Code Dataset
|
| 41 |
Our Web2Code instruction tuning dataset construction and instruction generation process
|
| 42 |
involves four key components:
|
|
|
|
| 160 |
|
| 161 |
**Table 6:** Distribution of DWU and DWU<sub>R</sub> datasets. Both datasets include high-quality question-answer pairs for webpage understanding.*
|
| 162 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
## Loading Crystal
|
| 164 |
```python
|
| 165 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|