DaSword
/

CAT-CLIP

ONNX

clip

Model card Files Files and versions Community

DaSword commited on Jul 9

Commit

c5c8ea2

verified ·

1 Parent(s): 4a2571f

Update README.md

Browse files

Files changed (1) hide show

README.md +170 -18

README.md CHANGED Viewed

@@ -1,31 +1,183 @@
 ---
 base_model: openai/clip-vit-base-patch32
-library_name: transformers.js
 ---
-https://huggingface.co/openai/clip-vit-base-patch32 with ONNX weights to be compatible with Transformers.js.
-## Usage (Transformers.js)
-If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
 ```bash
-npm i @huggingface/transformers
 ```
-**Example:** Perform zero-shot image classification with the `pipeline` API.
-```js
-import { pipeline } from '@huggingface/transformers';
-const classifier = await pipeline('zero-shot-image-classification', 'Xenova/clip-vit-base-patch32');
-const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
-const output = await classifier(url, ['tiger', 'horse', 'dog']);
-// [
-//   { score: 0.9993917942047119, label: 'tiger' },
-//   { score: 0.0003519294841680676, label: 'horse' },
-//   { score: 0.0002562698791734874, label: 'dog' }
-// ]
 ```
 ---
-Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).

 ---
 base_model: openai/clip-vit-base-patch32
+license: mit
 ---
+# CAT-CLIP: Cryptocurrency Analysis Tool - CLIP
+A simplified ONNX implementation of OpenAI's CLIP model specifically optimized for cryptocurrency-related image analysis tasks. This repository provides quantized ONNX models based on [Xenova/clip-vit-base-patch32](https://huggingface.co/Xenova/clip-vit-base-patch32), which itself is derived from [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32).
+## Overview
+This repository contains:
+- **Quantized ONNX models** (`text_model_q4f16.onnx`, `vision_model_q4f16.onnx`) for efficient inference
+- **Tokenizer and preprocessing configurations** compatible with Transformers.js
+- **Optimized model weights** for cryptocurrency-specific image classification tasks
+While currently a repackaged version of the base model, this repository serves as a foundation for future cryptocurrency-specific model distillation and fine-tuning efforts.
+## Usage
+### Python (ONNX Runtime)
+For more advanced cryptocurrency-specific use cases, see the example implementation in our classifier:
+```python
+from src.models.classifier import ImageClassifier
+from src.config.config import Config
+from PIL import Image
+# Initialize classifier with crypto-specific classes
+config = Config()
+classifier = ImageClassifier(config)
+# Load image
+image = Image.open("path/to/crypto_image.jpg")
+# Classify for cryptocurrency content
+result = classifier.predict(image)
+print(result)
+# Output: {'seed_phrase': 0.95, 'address': 0.02, 'handwriting': 0.03}
+# Get final classification
+classification = classifier._classify_image(image, result)
+print(f"Classification: {classification}")
+# Output: Classification: seed_phrase
+```
+**Batch processing:**
+```python
+images = [Image.open(f"image_{i}.jpg") for i in range(5)]
+results, classifications = classifier.predict_batch(images)
+for i, (result, classification) in enumerate(zip(results, classifications)):
+    print(f"Image {i}: {classification} (confidence: {result[classification]:.3f})")
+```
+## Current Capabilities
+The model is currently optimized for three main cryptocurrency-related classification tasks:
+1. **Seed Phrase Detection**: Identifies images containing cryptocurrency recovery/seed phrases or mnemonics
+2. **Crypto Address Detection**: Recognizes cryptocurrency addresses (26-35 characters) and associated QR codes
+3. **Handwriting Detection**: Detects handwritten text, particularly useful for identifying handwritten wallet information
+## Future Work
+We have several exciting developments planned to enhance this model's efficacy for cryptocurrency-specific problemsets:
+### Model Distillation & Optimization
+- **Domain-specific distillation**: Create a smaller, faster model trained specifically on cryptocurrency-related imagery
+- **Quantization improvements**: Explore INT8 and mixed-precision quantization for even better performance
+- **Hardware-specific optimizations**: Optimize models for mobile devices and edge computing scenarios
+### Enhanced Crypto-Specific Features
+- **Multi-language support**: Extend seed phrase detection to support mnemonics in multiple languages
+- **Blockchain-specific addressing**: Improve detection for various blockchain address formats (Bitcoin, Ethereum, etc.)
+- **Document structure analysis**: Better understanding of wallet documents, exchange screenshots, and transaction receipts
+- **Temporal analysis**: Detect and analyze sequences of images for comprehensive wallet recovery scenarios
+### Training Data & Fine-tuning
+- **Synthetic data generation**: Create large-scale synthetic datasets of cryptocurrency-related imagery
+- **Active learning pipeline**: Implement continuous learning from user feedback and corrections
+- **Cross-modal training**: Incorporate OCR text extraction with visual understanding for better accuracy
+### Performance & Scalability
+- **Real-time inference**: Optimize for sub-100ms inference times on consumer hardware
+- **Batch processing optimizations**: Improve efficiency for large-scale image analysis tasks
+- **Model compression**: Achieve similar accuracy with significantly smaller model sizes
+### Integration & Deployment
+- **REST API development**: Create production-ready APIs for easy integration
+- **Browser extension support**: Enable direct use in web browsers for real-time analysis
+- **Mobile SDKs**: Develop native mobile libraries for iOS and Android applications
+## Model Architecture
+- **Base Model**: OpenAI CLIP ViT-B/32
+- **Vision Encoder**: Vision Transformer (ViT) with 32x32 patch size
+- **Text Encoder**: Transformer-based text encoder
+- **Quantization**: Q4F16 (4-bit weights, 16-bit activations)
+- **Context Length**: 77 tokens
+- **Image Resolution**: 224x224 pixels
+## License
+This project is licensed under the MIT License, consistent with the original OpenAI CLIP model.
+### Original Model Licenses
+- **OpenAI CLIP**: MIT License - [openai/CLIP](https://github.com/openai/CLIP)
+- **Xenova CLIP**: MIT License - [Xenova/clip-vit-base-patch32](https://huggingface.co/Xenova/clip-vit-base-patch32)
+The MIT License permits commercial use, modification, distribution, and private use. See the [LICENSE](https://github.com/openai/CLIP/blob/main/LICENSE) file in the original OpenAI repository for full details.
+## Attribution
+This work builds upon several excellent open-source projects:
+- **OpenAI CLIP**: The foundational model and research by Alec Radford, Jong Wook Kim, et al.
+- **Xenova (Joshua)**: ONNX conversion and Transformers.js compatibility
+- **Hugging Face**: Model hosting and transformers library infrastructure
+- **Microsoft ONNX Runtime**: High-performance inference engine
+## Contributing
+We welcome contributions to improve this cryptocurrency-specific CLIP implementation! Here's how you can help:
+### Ways to Contribute
+1. **Bug Reports**: Found an issue? Please open a GitHub issue with detailed reproduction steps
+2. **Feature Requests**: Have ideas for crypto-specific enhancements? We'd love to hear them
+3. **Code Contributions**: Submit pull requests for bug fixes or new features
+4. **Dataset Contributions**: Help us build better training data for cryptocurrency use cases
+5. **Documentation**: Improve our documentation, examples, and tutorials
+### Development Setup
 ```bash
+# Clone the repository
+git clone https://github.com/yourusername/CAT-CLIP.git
+cd CAT-CLIP
+# Install dependencies
+pip install -r requirements.txt
+# Run tests
+python -m pytest tests/
 ```
+### Contribution Guidelines
+- Follow PEP 8 style guidelines for Python code
+- Include tests for new functionality
+- Update documentation for any new features
+- Ensure compatibility with both CPU and GPU inference
+- Test changes across different image types and sizes
+### Code of Conduct
+This project follows the [Contributor Covenant](https://www.contributor-covenant.org/) Code of Conduct. Please be respectful and inclusive in all interactions.
+## Citation
+If you use this model in your research or applications, please cite:
+```bibtex
+@misc{cat-clip-2024,
+  title={CAT-CLIP: Cryptocurrency Analysis Tool - CLIP},
+  author={Your Name},
+  year={2024},
+  url={https://github.com/yourusername/CAT-CLIP}
+}
+@article{radford2021learning,
+  title={Learning transferable visual models from natural language supervision},
+  author={Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},
+  journal={International conference on machine learning},
+  year={2021}
+}
 ```
 ---
+**Note**: This is a specialized implementation intended for cryptocurrency-related image analysis. For general-purpose CLIP usage, consider using the original [OpenAI CLIP](https://github.com/openai/CLIP) or [Xenova's implementation](https://huggingface.co/Xenova/clip-vit-base-patch32) directly.