Add link to paper and project page (#3)

Browse files

- Add link to paper and project page (a357c910307598c6ffacd79d1fa2c68594bfa4cb)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md +11 -7

README.md CHANGED Viewed

@@ -1,10 +1,10 @@
 ---
-library_name: transformers
-license: apache-2.0
-language:
-- en
 base_model:
 - HuggingFaceTB/SmolVLM-256M-Instruct
 pipeline_tag: image-text-to-text
 ---
@@ -16,6 +16,8 @@ pipeline_tag: image-text-to-text
     </div>
 </div>
 ### 🚀 Features:
 - 🏷️ **DocTags for Efficient Tokenization** – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with **DoclingDocuments**.
 - 🔍 **OCR (Optical Character Recognition)** – Extracts text accurately from images.
@@ -39,7 +41,6 @@ pipeline_tag: image-text-to-text
 - 🧪 **Chemical Recognition**
 - 📙 **Datasets**
 ## ⌨️ Get started (code examples)
 You can use **transformers** or **vllm** to perform inference, and [Docling](https://github.com/docling-project/docling) to convert results to variety of ourput formats (md, html, etc.):
@@ -145,7 +146,8 @@ sampling_params = SamplingParams(
     temperature=0.0,
     max_tokens=8192)
-chat_template = f"<|im_start|>User:<image>{PROMPT_TEXT}<end_of_utterance>\nAssistant:"
 image_files = sorted([f for f in os.listdir(IMAGE_DIR) if f.lower().endswith((".png", ".jpg", ".jpeg"))])
@@ -253,6 +255,8 @@ DocTags are integrated with Docling, which allows export to HTML, Markdown, and
 **Paper:** [arXiv](https://arxiv.org/abs/2503.11576)
 **Citation:**
 ```
 @misc{nassar2025smoldoclingultracompactvisionlanguagemodel,
@@ -265,4 +269,4 @@ DocTags are integrated with Docling, which allows export to HTML, Markdown, and
       url={https://arxiv.org/abs/2503.11576},
 }
 ```
-**Demo:** [Coming soon]

 ---
 base_model:
 - HuggingFaceTB/SmolVLM-256M-Instruct
+language:
+- en
+library_name: transformers
+license: apache-2.0
 pipeline_tag: image-text-to-text
 ---
     </div>
 </div>
+This model was presented in the paper [SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion](https://huggingface.co/papers/2503.11576).
 ### 🚀 Features:
 - 🏷️ **DocTags for Efficient Tokenization** – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with **DoclingDocuments**.
 - 🔍 **OCR (Optical Character Recognition)** – Extracts text accurately from images.
 - 🧪 **Chemical Recognition**
 - 📙 **Datasets**
 ## ⌨️ Get started (code examples)
 You can use **transformers** or **vllm** to perform inference, and [Docling](https://github.com/docling-project/docling) to convert results to variety of ourput formats (md, html, etc.):
     temperature=0.0,
     max_tokens=8192)
+chat_template = f"<|im_start|>User:<image>{PROMPT_TEXT}<end_of_utterance>
+Assistant:"
 image_files = sorted([f for f in os.listdir(IMAGE_DIR) if f.lower().endswith((".png", ".jpg", ".jpeg"))])
 **Paper:** [arXiv](https://arxiv.org/abs/2503.11576)
+**Project Page:** [Hugging Face](https://huggingface.co/ds4sd/SmolDocling-256M-preview)
 **Citation:**
 ```
 @misc{nassar2025smoldoclingultracompactvisionlanguagemodel,
       url={https://arxiv.org/abs/2503.11576},
 }
 ```
+**Demo:** [Coming soon]