CAT-CLIP: Cryptocurrency Analysis Tool - CLIP
A simplified ONNX implementation of OpenAI's CLIP model specifically optimized for cryptocurrency-related image analysis tasks. This repository provides quantized ONNX models based on Xenova/clip-vit-base-patch32, which itself is derived from openai/clip-vit-base-patch32.
Overview
This repository contains:
- Quantized ONNX models (
text_model_q4f16.onnx
,vision_model_q4f16.onnx
) for efficient inference - Tokenizer and preprocessing configurations compatible with Transformers.js
- Optimized model weights for cryptocurrency-specific image classification tasks
While currently a repackaged version of the base model, this repository serves as a foundation for future cryptocurrency-specific model distillation and fine-tuning efforts.
Usage
Python (ONNX Runtime)
For more advanced cryptocurrency-specific use cases, see the example implementation in our classifier:
from src.models.classifier import ImageClassifier
from src.config.config import Config
from PIL import Image
# Initialize classifier with crypto-specific classes
config = Config()
classifier = ImageClassifier(config)
# Load image
image = Image.open("path/to/crypto_image.jpg")
# Classify for cryptocurrency content
result = classifier.predict(image)
print(result)
# Output: {'seed_phrase': 0.95, 'address': 0.02, 'handwriting': 0.03}
# Get final classification
classification = classifier._classify_image(image, result)
print(f"Classification: {classification}")
# Output: Classification: seed_phrase
Batch processing:
images = [Image.open(f"image_{i}.jpg") for i in range(5)]
results, classifications = classifier.predict_batch(images)
for i, (result, classification) in enumerate(zip(results, classifications)):
print(f"Image {i}: {classification} (confidence: {result[classification]:.3f})")
Current Capabilities
The model is currently optimized for three main cryptocurrency-related classification tasks:
- Seed Phrase Detection: Identifies images containing cryptocurrency recovery/seed phrases or mnemonics
- Crypto Address Detection: Recognizes cryptocurrency addresses (26-35 characters) and associated QR codes
- Handwriting Detection: Detects handwritten text, particularly useful for identifying handwritten wallet information
Future Work
We have several exciting developments planned to enhance this model's efficacy for cryptocurrency-specific problemsets:
Model Distillation & Optimization
- Domain-specific distillation: Create a smaller, faster model trained specifically on cryptocurrency-related imagery
- Quantization improvements: Explore INT8 and mixed-precision quantization for even better performance
- Hardware-specific optimizations: Optimize models for mobile devices and edge computing scenarios
Enhanced Crypto-Specific Features
- Multi-language support: Extend seed phrase detection to support mnemonics in multiple languages
- Blockchain-specific addressing: Improve detection for various blockchain address formats (Bitcoin, Ethereum, etc.)
- Document structure analysis: Better understanding of wallet documents, exchange screenshots, and transaction receipts
- Temporal analysis: Detect and analyze sequences of images for comprehensive wallet recovery scenarios
Training Data & Fine-tuning
- Synthetic data generation: Create large-scale synthetic datasets of cryptocurrency-related imagery
- Active learning pipeline: Implement continuous learning from user feedback and corrections
- Cross-modal training: Incorporate OCR text extraction with visual understanding for better accuracy
Performance & Scalability
- Real-time inference: Optimize for sub-100ms inference times on consumer hardware
- Batch processing optimizations: Improve efficiency for large-scale image analysis tasks
- Model compression: Achieve similar accuracy with significantly smaller model sizes
Integration & Deployment
- REST API development: Create production-ready APIs for easy integration
- Browser extension support: Enable direct use in web browsers for real-time analysis
- Mobile SDKs: Develop native mobile libraries for iOS and Android applications
Model Architecture
- Base Model: OpenAI CLIP ViT-B/32
- Vision Encoder: Vision Transformer (ViT) with 32x32 patch size
- Text Encoder: Transformer-based text encoder
- Quantization: Q4F16 (4-bit weights, 16-bit activations)
- Context Length: 77 tokens
- Image Resolution: 224x224 pixels
License
This project is licensed under the MIT License, consistent with the original OpenAI CLIP model.
Original Model Licenses
- OpenAI CLIP: MIT License - openai/CLIP
- Xenova CLIP: MIT License - Xenova/clip-vit-base-patch32
The MIT License permits commercial use, modification, distribution, and private use. See the LICENSE file in the original OpenAI repository for full details.
Attribution
This work builds upon several excellent open-source projects:
- OpenAI CLIP: The foundational model and research by Alec Radford, Jong Wook Kim, et al.
- Xenova (Joshua): ONNX conversion and Transformers.js compatibility
- Hugging Face: Model hosting and transformers library infrastructure
- Microsoft ONNX Runtime: High-performance inference engine
Contributing
We welcome contributions to improve this cryptocurrency-specific CLIP implementation! Here's how you can help:
Ways to Contribute
- Bug Reports: Found an issue? Please open a GitHub issue with detailed reproduction steps
- Feature Requests: Have ideas for crypto-specific enhancements? We'd love to hear them
- Code Contributions: Submit pull requests for bug fixes or new features
- Dataset Contributions: Help us build better training data for cryptocurrency use cases
- Documentation: Improve our documentation, examples, and tutorials
Development Setup
# Clone the repository
git clone https://github.com/yourusername/CAT-CLIP.git
cd CAT-CLIP
# Install dependencies
pip install -r requirements.txt
# Run tests
python -m pytest tests/
Contribution Guidelines
- Follow PEP 8 style guidelines for Python code
- Include tests for new functionality
- Update documentation for any new features
- Ensure compatibility with both CPU and GPU inference
- Test changes across different image types and sizes
Code of Conduct
This project follows the Contributor Covenant Code of Conduct. Please be respectful and inclusive in all interactions.
Citation
If you use this model in your research or applications, please cite:
@misc{cat-clip-2024,
title={CAT-CLIP: Cryptocurrency Analysis Tool - CLIP},
author={Your Name},
year={2024},
url={https://github.com/yourusername/CAT-CLIP}
}
@article{radford2021learning,
title={Learning transferable visual models from natural language supervision},
author={Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},
journal={International conference on machine learning},
year={2021}
}
Note: This is a specialized implementation intended for cryptocurrency-related image analysis. For general-purpose CLIP usage, consider using the original OpenAI CLIP or Xenova's implementation directly.
- Downloads last month
- 87
Model tree for DaSword/CAT-CLIP
Base model
openai/clip-vit-base-patch32