colesmcintosh commited on
Commit
d5cce9f
·
verified ·
1 Parent(s): 6ea4c71

Add handler.py to resolve “no handler.py file found” deployment error

Browse files

**Title:** Add handler.py to resolve “no handler.py file found” deployment error

### Why

Deploying **allenai/olmOCR‑7B‑0725** on Hugging Face Inference Endpoints failed because the repository lacked a `handler.py`. A handler is required so the service knows how to run inference.
![Screenshot 2025-07-28 at 12.47.36 PM.png](https://cdn-uploads.huggingface.co/production/uploads/628463974695ad4fb6fa68b1/6KheGPpgNl-VCyzLQ9cH6.png)
### What changed

* **Added `handler.py`**

* Loads the model and processor from `allenai/olmOCR‑7B‑0725`
* Accepts an image under the `inputs` key
* Reads `max_new_tokens` from `data["parameters"]` and defaults to `256` if not supplied
* Returns the OCR text in `generated_text`

Files changed (1) hide show
  1. handler.py +33 -0
handler.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any
2
+ import torch
3
+ from transformers import AutoModelForSeq2SeqLM, AutoProcessor
4
+
5
+ class EndpointHandler:
6
+ """
7
+ Handler for allenai/olmOCR-7B-0725
8
+
9
+ Input:
10
+ {
11
+ "inputs": <PIL.Image | base64 str | URL>,
12
+ "parameters": {"max_new_tokens": <int, optional>}
13
+ }
14
+
15
+ Output: {"generated_text": <str>}
16
+ """
17
+
18
+ def __init__(self, path: str = "") -> None:
19
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
20
+ model_path = path or "allenai/olmOCR-7B-0725"
21
+ self.processor = AutoProcessor.from_pretrained(model_path)
22
+ self.model = AutoModelForSeq2SeqLM.from_pretrained(model_path).to(self.device)
23
+
24
+ def __call__(self, data: dict) -> Any:
25
+ image = data.get("inputs")
26
+ params = data.get("parameters", {})
27
+ max_tokens = params.get("max_new_tokens", 256)
28
+
29
+ inputs = self.processor(images=image, return_tensors="pt").to(self.device)
30
+ ids = self.model.generate(**inputs, max_new_tokens=max_tokens)
31
+ text = self.processor.batch_decode(ids, skip_special_tokens=True)[0]
32
+
33
+ return {"generated_text": text}