zai-org
/

GLM-4.5V-FP8

Image-Text-to-Text

compressed-tensors

Model card Files Files and versions Community

zRzRzRzRzRzRzR commited on 24 days ago

Commit

c96e009

·

1 Parent(s): 58009ba

update

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ library_name: transformers
 <p align="center">
     👋 Join our <a href="https://discord.com/invite/8cnQKdAprg" target="_blank">Discord</a> communities.
     <br>
-    📖 Check out the <a href="https://arxiv.org/abs/2507.01006" target="_blank">paper</a>.
     <br>
     📍 Access the GLM-V series models via API on the <a href="https://docs.z.ai/guides/vlm/glm-4.5v">ZhipuAI Open Platform</a>.
 </p>
@@ -41,7 +41,7 @@ Beyond benchmark performance, GLM-4.5V focuses on real-world usability. Through
 The model also introduces a **Thinking Mode** switch, allowing users to balance between quick responses and deep reasoning. This switch works the same as in the `GLM-4.5` language model.
-## Quick Start
 For more code information, please visit our [GitHub](https://github.com/zai-org/GLM-V/).

 <p align="center">
     👋 Join our <a href="https://discord.com/invite/8cnQKdAprg" target="_blank">Discord</a> communities.
     <br>
+    📖 Check out the <a href="https://github.com/zai-org/GLM-V/blob/main/resources/GLM-4.5V_technical_report.pdf" target="_blank">paper</a>.
     <br>
     📍 Access the GLM-V series models via API on the <a href="https://docs.z.ai/guides/vlm/glm-4.5v">ZhipuAI Open Platform</a>.
 </p>
 The model also introduces a **Thinking Mode** switch, allowing users to balance between quick responses and deep reasoning. This switch works the same as in the `GLM-4.5` language model.
+The special tokens `<|begin_of_box|>` and `<|end_of_box|>` in the response mark the answer’s bounding box in the image. The bounding box is given as four numbers — for example `[x1, y1, x2, y2]`, where `(x1, y1)` is the top-left corner and `(x2, y2`)` is the bottom-right corner. The bracket style may vary ([], [[]], (), <>, etc.), but the meaning is the same: it encloses the coordinates of the box. These coordinates are relative values between 0 and 1000, normalized to the image size.
 For more code information, please visit our [GitHub](https://github.com/zai-org/GLM-V/).