textract-ai - FIXED VERSION ✅
🎉 FIXED: Hub loading now works properly!
A high-accuracy OCR model based on Qwen2-VL-2B-Instruct, now with proper Hugging Face Hub support.
✅ What's Fixed
- Hub Loading:
AutoModel.from_pretrained()
now works correctly - from_pretrained Method: Proper implementation added
- Configuration: Fixed model configuration for Hub compatibility
- Error Handling: Improved error handling and fallbacks
🚀 Quick Start (NOW WORKS!)
from transformers import AutoModel
from PIL import Image
# Load model from Hub (FIXED!)
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
# Load image
image = Image.open("your_image.jpg")
# Extract text
result = model.generate_ocr_text(image, use_native=True)
print(f"Text: {result['text']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Success: {result['success']}")
📊 Performance
- 🎯 Accuracy: High accuracy OCR (up to 95% confidence)
- ⏱️ Speed: ~13 seconds per image (high quality)
- 🌍 Languages: Multi-language support
- 💻 Device: CPU and GPU support
- 📄 Documents: Excellent for complex documents
🛠️ Features
- ✅ Hub Loading: Works with
AutoModel.from_pretrained()
- ✅ High Accuracy: Based on Qwen2-VL-2B-Instruct
- ✅ Multi-language: Supports many languages
- ✅ Document OCR: Excellent for invoices, forms, documents
- ✅ Robust Processing: Multiple extraction methods
- ✅ Production Ready: Error handling included
📝 Usage Examples
Basic Usage
from transformers import AutoModel
from PIL import Image
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
image = Image.open("document.jpg")
result = model.generate_ocr_text(image, use_native=True)
High Accuracy Mode
result = model.generate_ocr_text(image, use_native=True) # Best accuracy
Fast Mode
result = model.generate_ocr_text(image, use_native=False) # Faster processing
File Path Input
result = model.generate_ocr_text("path/to/your/image.jpg")
🔧 Installation
pip install torch transformers pillow
📈 Model Details
- Base Model: Qwen/Qwen2-VL-2B-Instruct
- Model Size: ~2.5B parameters
- Architecture: Vision-Language Transformer
- Optimization: OCR-specific processing
- Training: Custom OCR pipeline
🆚 Comparison
Feature | Before (Broken) | After (FIXED) |
---|---|---|
Hub Loading | ❌ ValueError | ✅ Works perfectly |
from_pretrained | ❌ Missing | ✅ Implemented |
AutoModel | ❌ Failed | ✅ Compatible |
Configuration | ❌ Invalid | ✅ Proper config |
🎯 Use Cases
- High-Accuracy OCR: When accuracy is most important
- Document Processing: Complex invoices, forms, contracts
- Multi-language Text: International documents
- Professional OCR: Business and enterprise use
- Research Applications: Academic and research projects
🔗 Related Models
- pixeltext-ai: https://huggingface.co/BabaK07/pixeltext-ai (PaliGemma-based, faster)
- Base Model: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
📞 Support
For issues or questions, please check the model repository or contact the author.
Status: ✅ FIXED and ready for production use!
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support