Add link to paper and project page (#3)
Browse files- Add link to paper and project page (a357c910307598c6ffacd79d1fa2c68594bfa4cb)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
| 1 |
---
|
| 2 |
-
library_name: transformers
|
| 3 |
-
license: apache-2.0
|
| 4 |
-
language:
|
| 5 |
-
- en
|
| 6 |
base_model:
|
| 7 |
- HuggingFaceTB/SmolVLM-256M-Instruct
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
pipeline_tag: image-text-to-text
|
| 9 |
---
|
| 10 |
|
|
@@ -16,6 +16,8 @@ pipeline_tag: image-text-to-text
|
|
| 16 |
</div>
|
| 17 |
</div>
|
| 18 |
|
|
|
|
|
|
|
| 19 |
### 🚀 Features:
|
| 20 |
- 🏷️ **DocTags for Efficient Tokenization** – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with **DoclingDocuments**.
|
| 21 |
- 🔍 **OCR (Optical Character Recognition)** – Extracts text accurately from images.
|
|
@@ -39,7 +41,6 @@ pipeline_tag: image-text-to-text
|
|
| 39 |
- 🧪 **Chemical Recognition**
|
| 40 |
- 📙 **Datasets**
|
| 41 |
|
| 42 |
-
|
| 43 |
## ⌨️ Get started (code examples)
|
| 44 |
|
| 45 |
You can use **transformers** or **vllm** to perform inference, and [Docling](https://github.com/docling-project/docling) to convert results to variety of ourput formats (md, html, etc.):
|
|
@@ -145,7 +146,8 @@ sampling_params = SamplingParams(
|
|
| 145 |
temperature=0.0,
|
| 146 |
max_tokens=8192)
|
| 147 |
|
| 148 |
-
chat_template = f"<|im_start|>User:<image>{PROMPT_TEXT}<end_of_utterance
|
|
|
|
| 149 |
|
| 150 |
image_files = sorted([f for f in os.listdir(IMAGE_DIR) if f.lower().endswith((".png", ".jpg", ".jpeg"))])
|
| 151 |
|
|
@@ -253,6 +255,8 @@ DocTags are integrated with Docling, which allows export to HTML, Markdown, and
|
|
| 253 |
|
| 254 |
**Paper:** [arXiv](https://arxiv.org/abs/2503.11576)
|
| 255 |
|
|
|
|
|
|
|
| 256 |
**Citation:**
|
| 257 |
```
|
| 258 |
@misc{nassar2025smoldoclingultracompactvisionlanguagemodel,
|
|
@@ -265,4 +269,4 @@ DocTags are integrated with Docling, which allows export to HTML, Markdown, and
|
|
| 265 |
url={https://arxiv.org/abs/2503.11576},
|
| 266 |
}
|
| 267 |
```
|
| 268 |
-
**Demo:** [Coming soon]
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- HuggingFaceTB/SmolVLM-256M-Instruct
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
library_name: transformers
|
| 7 |
+
license: apache-2.0
|
| 8 |
pipeline_tag: image-text-to-text
|
| 9 |
---
|
| 10 |
|
|
|
|
| 16 |
</div>
|
| 17 |
</div>
|
| 18 |
|
| 19 |
+
This model was presented in the paper [SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion](https://huggingface.co/papers/2503.11576).
|
| 20 |
+
|
| 21 |
### 🚀 Features:
|
| 22 |
- 🏷️ **DocTags for Efficient Tokenization** – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with **DoclingDocuments**.
|
| 23 |
- 🔍 **OCR (Optical Character Recognition)** – Extracts text accurately from images.
|
|
|
|
| 41 |
- 🧪 **Chemical Recognition**
|
| 42 |
- 📙 **Datasets**
|
| 43 |
|
|
|
|
| 44 |
## ⌨️ Get started (code examples)
|
| 45 |
|
| 46 |
You can use **transformers** or **vllm** to perform inference, and [Docling](https://github.com/docling-project/docling) to convert results to variety of ourput formats (md, html, etc.):
|
|
|
|
| 146 |
temperature=0.0,
|
| 147 |
max_tokens=8192)
|
| 148 |
|
| 149 |
+
chat_template = f"<|im_start|>User:<image>{PROMPT_TEXT}<end_of_utterance>
|
| 150 |
+
Assistant:"
|
| 151 |
|
| 152 |
image_files = sorted([f for f in os.listdir(IMAGE_DIR) if f.lower().endswith((".png", ".jpg", ".jpeg"))])
|
| 153 |
|
|
|
|
| 255 |
|
| 256 |
**Paper:** [arXiv](https://arxiv.org/abs/2503.11576)
|
| 257 |
|
| 258 |
+
**Project Page:** [Hugging Face](https://huggingface.co/ds4sd/SmolDocling-256M-preview)
|
| 259 |
+
|
| 260 |
**Citation:**
|
| 261 |
```
|
| 262 |
@misc{nassar2025smoldoclingultracompactvisionlanguagemodel,
|
|
|
|
| 269 |
url={https://arxiv.org/abs/2503.11576},
|
| 270 |
}
|
| 271 |
```
|
| 272 |
+
**Demo:** [Coming soon]
|