|
--- |
|
language: |
|
- en |
|
- zh |
|
- es |
|
- fr |
|
- de |
|
- ja |
|
- ko |
|
- ar |
|
- hi |
|
- ru |
|
license: apache-2.0 |
|
tags: |
|
- ocr |
|
- vision-language |
|
- qwen2-vl |
|
- custom-model |
|
- text-extraction |
|
- document-ai |
|
- high-accuracy |
|
library_name: transformers |
|
pipeline_tag: image-to-text |
|
base_model: Qwen/Qwen2-VL-2B-Instruct |
|
--- |
|
|
|
# textract-ai - FIXED VERSION ✅ |
|
|
|
**🎉 FIXED: Hub loading now works properly!** |
|
|
|
A high-accuracy OCR model based on Qwen2-VL-2B-Instruct, now with proper Hugging Face Hub support. |
|
|
|
## ✅ What's Fixed |
|
|
|
- **Hub Loading**: `AutoModel.from_pretrained()` now works correctly |
|
- **from_pretrained Method**: Proper implementation added |
|
- **Configuration**: Fixed model configuration for Hub compatibility |
|
- **Error Handling**: Improved error handling and fallbacks |
|
|
|
## 🚀 Quick Start (NOW WORKS!) |
|
|
|
```python |
|
from transformers import AutoModel |
|
from PIL import Image |
|
|
|
# Load model from Hub (FIXED!) |
|
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True) |
|
|
|
# Load image |
|
image = Image.open("your_image.jpg") |
|
|
|
# Extract text |
|
result = model.generate_ocr_text(image, use_native=True) |
|
|
|
print(f"Text: {result['text']}") |
|
print(f"Confidence: {result['confidence']:.1%}") |
|
print(f"Success: {result['success']}") |
|
``` |
|
|
|
## 📊 Performance |
|
|
|
- 🎯 **Accuracy**: High accuracy OCR (up to 95% confidence) |
|
- ⏱️ **Speed**: ~13 seconds per image (high quality) |
|
- 🌍 **Languages**: Multi-language support |
|
- 💻 **Device**: CPU and GPU support |
|
- 📄 **Documents**: Excellent for complex documents |
|
|
|
## 🛠️ Features |
|
|
|
- ✅ **Hub Loading**: Works with `AutoModel.from_pretrained()` |
|
- ✅ **High Accuracy**: Based on Qwen2-VL-2B-Instruct |
|
- ✅ **Multi-language**: Supports many languages |
|
- ✅ **Document OCR**: Excellent for invoices, forms, documents |
|
- ✅ **Robust Processing**: Multiple extraction methods |
|
- ✅ **Production Ready**: Error handling included |
|
|
|
## 📝 Usage Examples |
|
|
|
### Basic Usage |
|
```python |
|
from transformers import AutoModel |
|
from PIL import Image |
|
|
|
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True) |
|
image = Image.open("document.jpg") |
|
result = model.generate_ocr_text(image, use_native=True) |
|
``` |
|
|
|
### High Accuracy Mode |
|
```python |
|
result = model.generate_ocr_text(image, use_native=True) # Best accuracy |
|
``` |
|
|
|
### Fast Mode |
|
```python |
|
result = model.generate_ocr_text(image, use_native=False) # Faster processing |
|
``` |
|
|
|
### File Path Input |
|
```python |
|
result = model.generate_ocr_text("path/to/your/image.jpg") |
|
``` |
|
|
|
## 🔧 Installation |
|
|
|
```bash |
|
pip install torch transformers pillow |
|
``` |
|
|
|
## 📈 Model Details |
|
|
|
- **Base Model**: Qwen/Qwen2-VL-2B-Instruct |
|
- **Model Size**: ~2.5B parameters |
|
- **Architecture**: Vision-Language Transformer |
|
- **Optimization**: OCR-specific processing |
|
- **Training**: Custom OCR pipeline |
|
|
|
## 🆚 Comparison |
|
|
|
| Feature | Before (Broken) | After (FIXED) | |
|
|---------|----------------|---------------| |
|
| Hub Loading | ❌ ValueError | ✅ Works perfectly | |
|
| from_pretrained | ❌ Missing | ✅ Implemented | |
|
| AutoModel | ❌ Failed | ✅ Compatible | |
|
| Configuration | ❌ Invalid | ✅ Proper config | |
|
|
|
## 🎯 Use Cases |
|
|
|
- **High-Accuracy OCR**: When accuracy is most important |
|
- **Document Processing**: Complex invoices, forms, contracts |
|
- **Multi-language Text**: International documents |
|
- **Professional OCR**: Business and enterprise use |
|
- **Research Applications**: Academic and research projects |
|
|
|
## 🔗 Related Models |
|
|
|
- **pixeltext-ai**: https://huggingface.co/BabaK07/pixeltext-ai (PaliGemma-based, faster) |
|
- **Base Model**: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct |
|
|
|
## 📞 Support |
|
|
|
For issues or questions, please check the model repository or contact the author. |
|
|
|
--- |
|
|
|
**Status**: ✅ FIXED and ready for production use! |
|
|