textract-ai / README.md
BabaK07's picture
FIX: Add proper README.md with from_pretrained support
09b5360 verified
---
language:
- en
- zh
- es
- fr
- de
- ja
- ko
- ar
- hi
- ru
license: apache-2.0
tags:
- ocr
- vision-language
- qwen2-vl
- custom-model
- text-extraction
- document-ai
- high-accuracy
library_name: transformers
pipeline_tag: image-to-text
base_model: Qwen/Qwen2-VL-2B-Instruct
---
# textract-ai - FIXED VERSION ✅
**🎉 FIXED: Hub loading now works properly!**
A high-accuracy OCR model based on Qwen2-VL-2B-Instruct, now with proper Hugging Face Hub support.
## ✅ What's Fixed
- **Hub Loading**: `AutoModel.from_pretrained()` now works correctly
- **from_pretrained Method**: Proper implementation added
- **Configuration**: Fixed model configuration for Hub compatibility
- **Error Handling**: Improved error handling and fallbacks
## 🚀 Quick Start (NOW WORKS!)
```python
from transformers import AutoModel
from PIL import Image
# Load model from Hub (FIXED!)
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
# Load image
image = Image.open("your_image.jpg")
# Extract text
result = model.generate_ocr_text(image, use_native=True)
print(f"Text: {result['text']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Success: {result['success']}")
```
## 📊 Performance
- 🎯 **Accuracy**: High accuracy OCR (up to 95% confidence)
- ⏱️ **Speed**: ~13 seconds per image (high quality)
- 🌍 **Languages**: Multi-language support
- 💻 **Device**: CPU and GPU support
- 📄 **Documents**: Excellent for complex documents
## 🛠️ Features
- ✅ **Hub Loading**: Works with `AutoModel.from_pretrained()`
- ✅ **High Accuracy**: Based on Qwen2-VL-2B-Instruct
- ✅ **Multi-language**: Supports many languages
- ✅ **Document OCR**: Excellent for invoices, forms, documents
- ✅ **Robust Processing**: Multiple extraction methods
- ✅ **Production Ready**: Error handling included
## 📝 Usage Examples
### Basic Usage
```python
from transformers import AutoModel
from PIL import Image
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
image = Image.open("document.jpg")
result = model.generate_ocr_text(image, use_native=True)
```
### High Accuracy Mode
```python
result = model.generate_ocr_text(image, use_native=True) # Best accuracy
```
### Fast Mode
```python
result = model.generate_ocr_text(image, use_native=False) # Faster processing
```
### File Path Input
```python
result = model.generate_ocr_text("path/to/your/image.jpg")
```
## 🔧 Installation
```bash
pip install torch transformers pillow
```
## 📈 Model Details
- **Base Model**: Qwen/Qwen2-VL-2B-Instruct
- **Model Size**: ~2.5B parameters
- **Architecture**: Vision-Language Transformer
- **Optimization**: OCR-specific processing
- **Training**: Custom OCR pipeline
## 🆚 Comparison
| Feature | Before (Broken) | After (FIXED) |
|---------|----------------|---------------|
| Hub Loading | ❌ ValueError | ✅ Works perfectly |
| from_pretrained | ❌ Missing | ✅ Implemented |
| AutoModel | ❌ Failed | ✅ Compatible |
| Configuration | ❌ Invalid | ✅ Proper config |
## 🎯 Use Cases
- **High-Accuracy OCR**: When accuracy is most important
- **Document Processing**: Complex invoices, forms, contracts
- **Multi-language Text**: International documents
- **Professional OCR**: Business and enterprise use
- **Research Applications**: Academic and research projects
## 🔗 Related Models
- **pixeltext-ai**: https://huggingface.co/BabaK07/pixeltext-ai (PaliGemma-based, faster)
- **Base Model**: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
## 📞 Support
For issues or questions, please check the model repository or contact the author.
---
**Status**: ✅ FIXED and ready for production use!