docs: Readme Updated for optimized Usage with transformers library
#60
by
sayed99
- opened
README.md
CHANGED
|
@@ -72,9 +72,11 @@ PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vi
|
|
| 72 |
|
| 73 |
|
| 74 |
## News
|
| 75 |
-
* ```2025.10.16``` 🚀 We release [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), — a multilingual documents parsing via a 0.9B Ultra-Compact Vision-Language Model with SOTA performance.
|
| 76 |
-
* ```2025.10.29``` Supports calling the core module PaddleOCR-VL-0.9B of PaddleOCR-VL via the `transformers` library.
|
| 77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
## Usage
|
| 80 |
|
|
@@ -113,15 +115,25 @@ for res in output:
|
|
| 113 |
|
| 114 |
### Accelerate VLM Inference via Optimized Inference Servers
|
| 115 |
|
| 116 |
-
1. Start the VLM inference server
|
| 117 |
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
2. Call the PaddleOCR CLI or Python API:
|
| 126 |
|
| 127 |
```bash
|
|
@@ -130,6 +142,7 @@ for res in output:
|
|
| 130 |
--vl_rec_backend vllm-server \
|
| 131 |
--vl_rec_server_url http://127.0.0.1:8080/v1
|
| 132 |
```
|
|
|
|
| 133 |
```python
|
| 134 |
from paddleocr import PaddleOCRVL
|
| 135 |
pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", vl_rec_server_url="http://127.0.0.1:8080/v1")
|
|
@@ -154,9 +167,14 @@ from PIL import Image
|
|
| 154 |
import torch
|
| 155 |
from transformers import AutoModelForCausalLM, AutoProcessor
|
| 156 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
|
| 158 |
|
| 159 |
-
CHOSEN_TASK = "ocr" # Options: 'ocr' | 'table' | 'chart' | 'formula'
|
| 160 |
PROMPTS = {
|
| 161 |
"ocr": "OCR:",
|
| 162 |
"table": "Table Recognition:",
|
|
@@ -164,8 +182,6 @@ PROMPTS = {
|
|
| 164 |
"chart": "Chart Recognition:",
|
| 165 |
}
|
| 166 |
|
| 167 |
-
model_path = "PaddlePaddle/PaddleOCR-VL"
|
| 168 |
-
image_path = "test.png"
|
| 169 |
image = Image.open(image_path).convert("RGB")
|
| 170 |
|
| 171 |
model = AutoModelForCausalLM.from_pretrained(
|
|
@@ -177,7 +193,7 @@ messages = [
|
|
| 177 |
{"role": "user",
|
| 178 |
"content": [
|
| 179 |
{"type": "image", "image": image},
|
| 180 |
-
{"type": "text", "text": PROMPTS[
|
| 181 |
]
|
| 182 |
}
|
| 183 |
]
|
|
@@ -186,7 +202,7 @@ inputs = processor.apply_chat_template(
|
|
| 186 |
tokenize=True,
|
| 187 |
add_generation_prompt=True,
|
| 188 |
return_dict=True,
|
| 189 |
-
|
| 190 |
).to(DEVICE)
|
| 191 |
|
| 192 |
outputs = model.generate(**inputs, max_new_tokens=1024)
|
|
@@ -194,6 +210,73 @@ outputs = processor.batch_decode(outputs, skip_special_tokens=True)[0]
|
|
| 194 |
print(outputs)
|
| 195 |
```
|
| 196 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 197 |
## Performance
|
| 198 |
|
| 199 |
### Page-Level Document Parsing
|
|
@@ -346,4 +429,4 @@ If you find PaddleOCR-VL helpful, feel free to give us a star and citation.
|
|
| 346 |
primaryClass={cs.CV},
|
| 347 |
url={https://arxiv.org/abs/2510.14528},
|
| 348 |
}
|
| 349 |
-
```
|
|
|
|
| 72 |
|
| 73 |
|
| 74 |
## News
|
|
|
|
|
|
|
| 75 |
|
| 76 |
+
* ```2025.11.07``` 🚀 Enabled `flash-attn` in the `transformers` library to achieve faster inference with PaddleOCR-VL-0.9B.
|
| 77 |
+
* ```2025.11.04``` 🌟 PaddleOCR-VL-0.9B is now officially supported on `vLLM` .
|
| 78 |
+
* ```2025.10.29``` 🤗 Supports calling the core module PaddleOCR-VL-0.9B of PaddleOCR-VL via the `transformers` library.
|
| 79 |
+
* ```2025.10.16``` 🚀 We release [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), — a multilingual documents parsing via a 0.9B Ultra-Compact Vision-Language Model with SOTA performance.
|
| 80 |
|
| 81 |
## Usage
|
| 82 |
|
|
|
|
| 115 |
|
| 116 |
### Accelerate VLM Inference via Optimized Inference Servers
|
| 117 |
|
| 118 |
+
1. Start the VLM inference server:
|
| 119 |
|
| 120 |
+
You can start the vLLM inference service using one of two methods:
|
| 121 |
+
|
| 122 |
+
- Method 1: PaddleOCR method
|
| 123 |
+
|
| 124 |
+
```bash
|
| 125 |
+
docker run \
|
| 126 |
+
--rm \
|
| 127 |
+
--gpus all \
|
| 128 |
+
--network host \
|
| 129 |
+
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest \
|
| 130 |
+
paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8080 --backend vllm
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
- Method 2: vLLM method
|
| 134 |
+
|
| 135 |
+
[vLLM: PaddleOCR-VL Usage Guide](https://docs.vllm.ai/projects/recipes/en/latest/PaddlePaddle/PaddleOCR-VL.html)
|
| 136 |
+
|
| 137 |
2. Call the PaddleOCR CLI or Python API:
|
| 138 |
|
| 139 |
```bash
|
|
|
|
| 142 |
--vl_rec_backend vllm-server \
|
| 143 |
--vl_rec_server_url http://127.0.0.1:8080/v1
|
| 144 |
```
|
| 145 |
+
|
| 146 |
```python
|
| 147 |
from paddleocr import PaddleOCRVL
|
| 148 |
pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", vl_rec_server_url="http://127.0.0.1:8080/v1")
|
|
|
|
| 167 |
import torch
|
| 168 |
from transformers import AutoModelForCausalLM, AutoProcessor
|
| 169 |
|
| 170 |
+
# ---- Settings ----
|
| 171 |
+
model_path = "PaddlePaddle/PaddleOCR-VL"
|
| 172 |
+
image_path = "test.png"
|
| 173 |
+
task = "ocr" # Options: 'ocr' | 'table' | 'chart' | 'formula'
|
| 174 |
+
# ------------------
|
| 175 |
+
|
| 176 |
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
|
| 177 |
|
|
|
|
| 178 |
PROMPTS = {
|
| 179 |
"ocr": "OCR:",
|
| 180 |
"table": "Table Recognition:",
|
|
|
|
| 182 |
"chart": "Chart Recognition:",
|
| 183 |
}
|
| 184 |
|
|
|
|
|
|
|
| 185 |
image = Image.open(image_path).convert("RGB")
|
| 186 |
|
| 187 |
model = AutoModelForCausalLM.from_pretrained(
|
|
|
|
| 193 |
{"role": "user",
|
| 194 |
"content": [
|
| 195 |
{"type": "image", "image": image},
|
| 196 |
+
{"type": "text", "text": PROMPTS[task]},
|
| 197 |
]
|
| 198 |
}
|
| 199 |
]
|
|
|
|
| 202 |
tokenize=True,
|
| 203 |
add_generation_prompt=True,
|
| 204 |
return_dict=True,
|
| 205 |
+
return_tensors="pt"
|
| 206 |
).to(DEVICE)
|
| 207 |
|
| 208 |
outputs = model.generate(**inputs, max_new_tokens=1024)
|
|
|
|
| 210 |
print(outputs)
|
| 211 |
```
|
| 212 |
|
| 213 |
+
<details>
|
| 214 |
+
<summary>👉 Click to expand: Use flash-attn to boost performance and reduce memory usage</summary>
|
| 215 |
+
|
| 216 |
+
```shell
|
| 217 |
+
# ensure the flash-attn2 is installed
|
| 218 |
+
pip install flash-attn --no-build-isolation
|
| 219 |
+
```
|
| 220 |
+
|
| 221 |
+
```python
|
| 222 |
+
import torch
|
| 223 |
+
from transformers import AutoModelForCausalLM, AutoProcessor
|
| 224 |
+
from PIL import Image
|
| 225 |
+
|
| 226 |
+
# ---- Settings ----
|
| 227 |
+
model_path = "PaddlePaddle/PaddleOCR-VL"
|
| 228 |
+
image_path = "test.png"
|
| 229 |
+
task = "ocr" # ← change to "table" | "chart" | "formula"
|
| 230 |
+
# ------------------
|
| 231 |
+
|
| 232 |
+
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
|
| 233 |
+
|
| 234 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 235 |
+
model_path,
|
| 236 |
+
trust_remote_code=True,
|
| 237 |
+
torch_dtype=torch.bfloat16,
|
| 238 |
+
attn_implementation="flash_attention_2",
|
| 239 |
+
).to(dtype=torch.bfloat16, device=DEVICE).eval()
|
| 240 |
+
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
|
| 241 |
+
|
| 242 |
+
PROMPTS = {
|
| 243 |
+
"ocr": "OCR:",
|
| 244 |
+
"table": "Table Recognition:",
|
| 245 |
+
"chart": "Chart Recognition:",
|
| 246 |
+
"formula": "Formula Recognition:",
|
| 247 |
+
}
|
| 248 |
+
messages = [
|
| 249 |
+
{
|
| 250 |
+
"role": "user",
|
| 251 |
+
"content": [
|
| 252 |
+
{"type": "image", "image": Image.open(image_path).convert("RGB")},
|
| 253 |
+
{"type": "text", "text": PROMPTS[task]}
|
| 254 |
+
]
|
| 255 |
+
}
|
| 256 |
+
]
|
| 257 |
+
|
| 258 |
+
inputs = processor.apply_chat_template(
|
| 259 |
+
messages,
|
| 260 |
+
tokenize=True,
|
| 261 |
+
add_generation_prompt=True,
|
| 262 |
+
return_dict=True,
|
| 263 |
+
return_tensors="pt"
|
| 264 |
+
).to(DEVICE)
|
| 265 |
+
|
| 266 |
+
with torch.inference_mode():
|
| 267 |
+
out = model.generate(
|
| 268 |
+
**inputs,
|
| 269 |
+
max_new_tokens=1024,
|
| 270 |
+
do_sample=False,
|
| 271 |
+
use_cache=True
|
| 272 |
+
)
|
| 273 |
+
|
| 274 |
+
outputs = processor.batch_decode(out, skip_special_tokens=True)[0]
|
| 275 |
+
print(outputs)
|
| 276 |
+
```
|
| 277 |
+
|
| 278 |
+
</details>
|
| 279 |
+
|
| 280 |
## Performance
|
| 281 |
|
| 282 |
### Page-Level Document Parsing
|
|
|
|
| 429 |
primaryClass={cs.CV},
|
| 430 |
url={https://arxiv.org/abs/2510.14528},
|
| 431 |
}
|
| 432 |
+
```
|