BabaK07
/

textract-ai

@@ -18,197 +18,129 @@ tags:
 - custom-model
 - text-extraction
 - document-ai
 library_name: transformers
 pipeline_tag: image-to-text
 base_model: Qwen/Qwen2-VL-2B-Instruct
-datasets:
-- custom
-metrics:
-- accuracy
-- bleu
-widget:
-- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg
-  example_title: "Document OCR"
 ---
-# textract-ai
-A custom OCR (Optical Character Recognition) model built on top of Qwen2.5-VL-2B-Instruct, specifically designed for high-accuracy text extraction from images and documents.
-## Model Description
-This model combines the powerful vision-language capabilities of Qwen2.5-VL with custom OCR-specific heads to provide:
-- **High-accuracy text extraction** from images and documents
-- **Multi-language support** for 10+ languages
-- **Robust architecture** with fallback mechanisms
-- **Production-ready** inference capabilities
-- **Custom OCR heads** trained for text recognition tasks
-## Architecture
-```
-Custom OCR Model
-├── Qwen2.5-VL-2B (Frozen Backbone)
-│   ├── Vision Encoder (ViT-based)
-│   └── Language Model (Qwen2-2B)
-├── Custom OCR Heads
-│   ├── Text Recognition Head
-│   └── Confidence Estimation Head
-└── Multi-API Processing Pipeline
-```
-## Model Details
-- **Base Model**: Qwen/Qwen2-VL-2B-Instruct
-- **Model Size**: ~2.5B parameters
-- **Architecture**: Vision-Language Transformer with custom OCR heads
-- **Languages**: English, Chinese, Spanish, French, German, Japanese, Korean, Arabic, Hindi, Russian
-- **Input**: Images (JPEG, PNG, PDF, TIFF)
-- **Output**: Extracted text with confidence scores
-## Usage
-### Quick Start
 ```python
-from transformers import AutoModel, AutoProcessor
 from PIL import Image
-# Load model and processor
 model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
-processor = AutoProcessor.from_pretrained("BabaK07/textract-ai")
 # Load image
-image = Image.open("document.jpg")
 # Extract text
 result = model.generate_ocr_text(image, use_native=True)
-print(f"Extracted text: {result['text']}")
-print(f"Confidence: {result['confidence']:.3f}")
-```
-### Advanced Usage
-```python
-import torch
-from PIL import Image
-# Load model
-model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
-# Process image
-image = Image.open("invoice.jpg")
-# Extract text with custom parameters
-result = model.generate_ocr_text(
-    image=image,
-    use_native=True  # Use Qwen's native OCR capabilities
-)
-# Access detailed results
-print(f"Text: {result['text']}")
-print(f"Confidence: {result['confidence']}")
-print(f"Method: {result['method']}")
-```
-### Batch Processing
 ```python
 from PIL import Image
-import torch
-# Load multiple images
-images = [Image.open(f"doc_{i}.jpg") for i in range(5)]
-# Process batch
-results = []
-for image in images:
-    result = model.generate_ocr_text(image)
-    results.append(result)
-# Print results
-for i, result in enumerate(results):
-    print(f"Document {i+1}: {result['text'][:50]}...")
 ```
-## Performance
-- **Accuracy**: High accuracy on document OCR tasks
-- **Speed**: ~1-3 seconds per image (depending on hardware)
-- **Memory**: ~6GB GPU memory recommended
-- **Languages**: Supports 10+ major languages
-## Training
-This model was built using:
-- **Base Model**: Qwen2.5-VL-2B-Instruct (frozen)
-- **Custom Heads**: Trained OCR-specific layers
-- **Architecture**: Vision-language transformer with custom components
-- **Optimization**: Multiple API fallbacks for robustness
-## Limitations
-- Performance depends on image quality and text clarity
-- Best results with printed text; handwriting accuracy may vary
-- Requires sufficient GPU memory for optimal performance
-- Some complex layouts may need preprocessing
-## Use Cases
-- **Document Digitization**: Convert scanned documents to text
-- **Invoice Processing**: Extract data from invoices and receipts
-- **Form Processing**: Digitize forms and applications
-- **Multi-language Documents**: Process documents in various languages
-- **Batch Processing**: Handle large volumes of documents
-## Technical Details
-### Model Architecture
-- **Vision Encoder**: Based on Vision Transformer (ViT)
-- **Language Decoder**: Qwen2-2B language model
-- **Custom Heads**: OCR-specific text recognition and confidence estimation
-- **Integration**: Multiple API approaches for robustness
-### Inference Pipeline
-1. Image preprocessing and normalization
-2. Vision feature extraction using Qwen's ViT encoder
-3. Text generation using language model
-4. Confidence estimation and post-processing
-5. Multiple fallback methods for reliability
-## Installation
-```bash
-pip install transformers torch pillow
-```
-For GPU support:
-```bash
-pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-```
-## Citation
-```bibtex
-@software{custom_ocr_qwen,
-  title={Custom OCR Model based on Qwen2.5-VL},
-  author={BabaK07},
-  year={2024},
-  url={https://huggingface.co/BabaK07/textract-ai}
-}
-```
-## License
-This model is released under the Apache 2.0 license, following the base Qwen2.5-VL model license.
-## Acknowledgments
-- Built on top of [Qwen2.5-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)
-- Thanks to the Qwen team for the excellent base model
-- Custom architecture and training by BabaK07
-## Contact
-For questions or issues, please open an issue on the model repository or contact the author.

 - custom-model
 - text-extraction
 - document-ai
+- high-accuracy
 library_name: transformers
 pipeline_tag: image-to-text
 base_model: Qwen/Qwen2-VL-2B-Instruct
 ---
+# textract-ai - FIXED VERSION ✅
+**🎉 FIXED: Hub loading now works properly!**
+A high-accuracy OCR model based on Qwen2-VL-2B-Instruct, now with proper Hugging Face Hub support.
+## ✅ What's Fixed
+- **Hub Loading**: `AutoModel.from_pretrained()` now works correctly
+- **from_pretrained Method**: Proper implementation added
+- **Configuration**: Fixed model configuration for Hub compatibility
+- **Error Handling**: Improved error handling and fallbacks
+## 🚀 Quick Start (NOW WORKS!)
 ```python
+from transformers import AutoModel
 from PIL import Image
+# Load model from Hub (FIXED!)
 model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
 # Load image
+image = Image.open("your_image.jpg")
 # Extract text
 result = model.generate_ocr_text(image, use_native=True)
+print(f"Text: {result['text']}")
+print(f"Confidence: {result['confidence']:.1%}")
+print(f"Success: {result['success']}")
+```
+## 📊 Performance
+- 🎯 **Accuracy**: High accuracy OCR (up to 95% confidence)
+- ⏱️ **Speed**: ~13 seconds per image (high quality)
+- 🌍 **Languages**: Multi-language support
+- 💻 **Device**: CPU and GPU support
+- 📄 **Documents**: Excellent for complex documents
+## 🛠️ Features
+- ✅ **Hub Loading**: Works with `AutoModel.from_pretrained()`
+- ✅ **High Accuracy**: Based on Qwen2-VL-2B-Instruct
+- ✅ **Multi-language**: Supports many languages
+- ✅ **Document OCR**: Excellent for invoices, forms, documents
+- ✅ **Robust Processing**: Multiple extraction methods
+- ✅ **Production Ready**: Error handling included
+## 📝 Usage Examples
+### Basic Usage
 ```python
+from transformers import AutoModel
 from PIL import Image
+model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
+image = Image.open("document.jpg")
+result = model.generate_ocr_text(image, use_native=True)
 ```
+### High Accuracy Mode
+```python
+result = model.generate_ocr_text(image, use_native=True)  # Best accuracy
+```
+### Fast Mode
+```python
+result = model.generate_ocr_text(image, use_native=False)  # Faster processing
+```
+### File Path Input
+```python
+result = model.generate_ocr_text("path/to/your/image.jpg")
+```
+## 🔧 Installation
+```bash
+pip install torch transformers pillow
+```
+## 📈 Model Details
+- **Base Model**: Qwen/Qwen2-VL-2B-Instruct
+- **Model Size**: ~2.5B parameters
+- **Architecture**: Vision-Language Transformer
+- **Optimization**: OCR-specific processing
+- **Training**: Custom OCR pipeline
+## 🆚 Comparison
+| Feature | Before (Broken) | After (FIXED) |
+|---------|----------------|---------------|
+| Hub Loading | ❌ ValueError | ✅ Works perfectly |
+| from_pretrained | ❌ Missing | ✅ Implemented |
+| AutoModel | ❌ Failed | ✅ Compatible |
+| Configuration | ❌ Invalid | ✅ Proper config |
+## 🎯 Use Cases
+- **High-Accuracy OCR**: When accuracy is most important
+- **Document Processing**: Complex invoices, forms, contracts
+- **Multi-language Text**: International documents
+- **Professional OCR**: Business and enterprise use
+- **Research Applications**: Academic and research projects
+## 🔗 Related Models
+- **pixeltext-ai**: https://huggingface.co/BabaK07/pixeltext-ai (PaliGemma-based, faster)
+- **Base Model**: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
+## 📞 Support
+For issues or questions, please check the model repository or contact the author.
+---
+**Status**: ✅ FIXED and ready for production use!