textract-ai - FIXED VERSION ✅

🎉 FIXED: Hub loading now works properly!

A high-accuracy OCR model based on Qwen2-VL-2B-Instruct, now with proper Hugging Face Hub support.

✅ What's Fixed

Hub Loading: AutoModel.from_pretrained() now works correctly
from_pretrained Method: Proper implementation added
Configuration: Fixed model configuration for Hub compatibility
Error Handling: Improved error handling and fallbacks

🚀 Quick Start (NOW WORKS!)

from transformers import AutoModel
from PIL import Image

# Load model from Hub (FIXED!)
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)

# Load image
image = Image.open("your_image.jpg")

# Extract text
result = model.generate_ocr_text(image, use_native=True)

print(f"Text: {result['text']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Success: {result['success']}")

📊 Performance

🎯 Accuracy: High accuracy OCR (up to 95% confidence)
⏱️ Speed: ~13 seconds per image (high quality)
🌍 Languages: Multi-language support
💻 Device: CPU and GPU support
📄 Documents: Excellent for complex documents

🛠️ Features

✅ Hub Loading: Works with AutoModel.from_pretrained()
✅ High Accuracy: Based on Qwen2-VL-2B-Instruct
✅ Multi-language: Supports many languages
✅ Document OCR: Excellent for invoices, forms, documents
✅ Robust Processing: Multiple extraction methods
✅ Production Ready: Error handling included

📝 Usage Examples

Basic Usage

from transformers import AutoModel
from PIL import Image

model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
image = Image.open("document.jpg")
result = model.generate_ocr_text(image, use_native=True)

High Accuracy Mode

result = model.generate_ocr_text(image, use_native=True)  # Best accuracy

Fast Mode

result = model.generate_ocr_text(image, use_native=False)  # Faster processing

File Path Input

result = model.generate_ocr_text("path/to/your/image.jpg")

🔧 Installation

pip install torch transformers pillow

📈 Model Details

Base Model: Qwen/Qwen2-VL-2B-Instruct
Model Size: ~2.5B parameters
Architecture: Vision-Language Transformer
Optimization: OCR-specific processing
Training: Custom OCR pipeline

🆚 Comparison

Feature	Before (Broken)	After (FIXED)
Hub Loading	❌ ValueError	✅ Works perfectly
from_pretrained	❌ Missing	✅ Implemented
AutoModel	❌ Failed	✅ Compatible
Configuration	❌ Invalid	✅ Proper config

🎯 Use Cases

High-Accuracy OCR: When accuracy is most important
Document Processing: Complex invoices, forms, contracts
Multi-language Text: International documents
Professional OCR: Business and enterprise use
Research Applications: Academic and research projects

🔗 Related Models

pixeltext-ai: https://huggingface.co/BabaK07/pixeltext-ai (PaliGemma-based, faster)
Base Model: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct

📞 Support

For issues or questions, please check the model repository or contact the author.

Status: ✅ FIXED and ready for production use!

BabaK07
/

textract-ai