File size: 3,717 Bytes
b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 b127e5d 09b5360 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
---
language:
- en
- zh
- es
- fr
- de
- ja
- ko
- ar
- hi
- ru
license: apache-2.0
tags:
- ocr
- vision-language
- qwen2-vl
- custom-model
- text-extraction
- document-ai
- high-accuracy
library_name: transformers
pipeline_tag: image-to-text
base_model: Qwen/Qwen2-VL-2B-Instruct
---
# textract-ai - FIXED VERSION ✅
**🎉 FIXED: Hub loading now works properly!**
A high-accuracy OCR model based on Qwen2-VL-2B-Instruct, now with proper Hugging Face Hub support.
## ✅ What's Fixed
- **Hub Loading**: `AutoModel.from_pretrained()` now works correctly
- **from_pretrained Method**: Proper implementation added
- **Configuration**: Fixed model configuration for Hub compatibility
- **Error Handling**: Improved error handling and fallbacks
## 🚀 Quick Start (NOW WORKS!)
```python
from transformers import AutoModel
from PIL import Image
# Load model from Hub (FIXED!)
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
# Load image
image = Image.open("your_image.jpg")
# Extract text
result = model.generate_ocr_text(image, use_native=True)
print(f"Text: {result['text']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Success: {result['success']}")
```
## 📊 Performance
- 🎯 **Accuracy**: High accuracy OCR (up to 95% confidence)
- ⏱️ **Speed**: ~13 seconds per image (high quality)
- 🌍 **Languages**: Multi-language support
- 💻 **Device**: CPU and GPU support
- 📄 **Documents**: Excellent for complex documents
## 🛠️ Features
- ✅ **Hub Loading**: Works with `AutoModel.from_pretrained()`
- ✅ **High Accuracy**: Based on Qwen2-VL-2B-Instruct
- ✅ **Multi-language**: Supports many languages
- ✅ **Document OCR**: Excellent for invoices, forms, documents
- ✅ **Robust Processing**: Multiple extraction methods
- ✅ **Production Ready**: Error handling included
## 📝 Usage Examples
### Basic Usage
```python
from transformers import AutoModel
from PIL import Image
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
image = Image.open("document.jpg")
result = model.generate_ocr_text(image, use_native=True)
```
### High Accuracy Mode
```python
result = model.generate_ocr_text(image, use_native=True) # Best accuracy
```
### Fast Mode
```python
result = model.generate_ocr_text(image, use_native=False) # Faster processing
```
### File Path Input
```python
result = model.generate_ocr_text("path/to/your/image.jpg")
```
## 🔧 Installation
```bash
pip install torch transformers pillow
```
## 📈 Model Details
- **Base Model**: Qwen/Qwen2-VL-2B-Instruct
- **Model Size**: ~2.5B parameters
- **Architecture**: Vision-Language Transformer
- **Optimization**: OCR-specific processing
- **Training**: Custom OCR pipeline
## 🆚 Comparison
| Feature | Before (Broken) | After (FIXED) |
|---------|----------------|---------------|
| Hub Loading | ❌ ValueError | ✅ Works perfectly |
| from_pretrained | ❌ Missing | ✅ Implemented |
| AutoModel | ❌ Failed | ✅ Compatible |
| Configuration | ❌ Invalid | ✅ Proper config |
## 🎯 Use Cases
- **High-Accuracy OCR**: When accuracy is most important
- **Document Processing**: Complex invoices, forms, contracts
- **Multi-language Text**: International documents
- **Professional OCR**: Business and enterprise use
- **Research Applications**: Academic and research projects
## 🔗 Related Models
- **pixeltext-ai**: https://huggingface.co/BabaK07/pixeltext-ai (PaliGemma-based, faster)
- **Base Model**: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
## 📞 Support
For issues or questions, please check the model repository or contact the author.
---
**Status**: ✅ FIXED and ready for production use!
|