File size: 3,717 Bytes

---
language:
- en
- zh
- es
- fr
- de
- ja
- ko
- ar
- hi
- ru
license: apache-2.0
tags:
- ocr
- vision-language
- qwen2-vl
- custom-model
- text-extraction
- document-ai
- high-accuracy
library_name: transformers
pipeline_tag: image-to-text
base_model: Qwen/Qwen2-VL-2B-Instruct
---

# textract-ai - FIXED VERSION ✅

**🎉 FIXED: Hub loading now works properly!**

A high-accuracy OCR model based on Qwen2-VL-2B-Instruct, now with proper Hugging Face Hub support.

## ✅ What's Fixed

- **Hub Loading**: `AutoModel.from_pretrained()` now works correctly
- **from_pretrained Method**: Proper implementation added
- **Configuration**: Fixed model configuration for Hub compatibility
- **Error Handling**: Improved error handling and fallbacks

## 🚀 Quick Start (NOW WORKS!)

```python
from transformers import AutoModel
from PIL import Image

# Load model from Hub (FIXED!)
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)

# Load image
image = Image.open("your_image.jpg")

# Extract text
result = model.generate_ocr_text(image, use_native=True)

print(f"Text: {result['text']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Success: {result['success']}")
```

## 📊 Performance

- 🎯 **Accuracy**: High accuracy OCR (up to 95% confidence)
- ⏱️ **Speed**: ~13 seconds per image (high quality)
- 🌍 **Languages**: Multi-language support
- 💻 **Device**: CPU and GPU support
- 📄 **Documents**: Excellent for complex documents

## 🛠️ Features

- ✅ **Hub Loading**: Works with `AutoModel.from_pretrained()`
- ✅ **High Accuracy**: Based on Qwen2-VL-2B-Instruct
- ✅ **Multi-language**: Supports many languages
- ✅ **Document OCR**: Excellent for invoices, forms, documents
- ✅ **Robust Processing**: Multiple extraction methods
- ✅ **Production Ready**: Error handling included

## 📝 Usage Examples

### Basic Usage
```python
from transformers import AutoModel
from PIL import Image

model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
image = Image.open("document.jpg")
result = model.generate_ocr_text(image, use_native=True)
```

### High Accuracy Mode
```python
result = model.generate_ocr_text(image, use_native=True)  # Best accuracy
```

### Fast Mode
```python
result = model.generate_ocr_text(image, use_native=False)  # Faster processing
```

### File Path Input
```python
result = model.generate_ocr_text("path/to/your/image.jpg")
```

## 🔧 Installation

```bash
pip install torch transformers pillow
```

## 📈 Model Details

- **Base Model**: Qwen/Qwen2-VL-2B-Instruct
- **Model Size**: ~2.5B parameters
- **Architecture**: Vision-Language Transformer
- **Optimization**: OCR-specific processing
- **Training**: Custom OCR pipeline

## 🆚 Comparison

| Feature | Before (Broken) | After (FIXED) |
|---------|----------------|---------------|
| Hub Loading | ❌ ValueError | ✅ Works perfectly |
| from_pretrained | ❌ Missing | ✅ Implemented |
| AutoModel | ❌ Failed | ✅ Compatible |
| Configuration | ❌ Invalid | ✅ Proper config |

## 🎯 Use Cases

- **High-Accuracy OCR**: When accuracy is most important
- **Document Processing**: Complex invoices, forms, contracts
- **Multi-language Text**: International documents
- **Professional OCR**: Business and enterprise use
- **Research Applications**: Academic and research projects

## 🔗 Related Models

- **pixeltext-ai**: https://huggingface.co/BabaK07/pixeltext-ai (PaliGemma-based, faster)
- **Base Model**: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct

## 📞 Support

For issues or questions, please check the model repository or contact the author.

---

**Status**: ✅ FIXED and ready for production use!