CAT-CLIP: Cryptocurrency Analysis Tool - CLIP

A simplified ONNX implementation of OpenAI's CLIP model specifically optimized for cryptocurrency-related image analysis tasks. This repository provides quantized ONNX models based on Xenova/clip-vit-base-patch32, which itself is derived from openai/clip-vit-base-patch32.

Overview

This repository contains:

Quantized ONNX models (text_model_q4f16.onnx, vision_model_q4f16.onnx) for efficient inference
Tokenizer and preprocessing configurations compatible with Transformers.js
Optimized model weights for cryptocurrency-specific image classification tasks

While currently a repackaged version of the base model, this repository serves as a foundation for future cryptocurrency-specific model distillation and fine-tuning efforts.

Usage

Python (ONNX Runtime)

For more advanced cryptocurrency-specific use cases, see the example implementation in our classifier:

from src.models.classifier import ImageClassifier
from src.config.config import Config
from PIL import Image

# Initialize classifier with crypto-specific classes
config = Config()
classifier = ImageClassifier(config)

# Load image
image = Image.open("path/to/crypto_image.jpg")

# Classify for cryptocurrency content
result = classifier.predict(image)
print(result)
# Output: {'seed_phrase': 0.95, 'address': 0.02, 'handwriting': 0.03}

# Get final classification
classification = classifier._classify_image(image, result)
print(f"Classification: {classification}")
# Output: Classification: seed_phrase

Batch processing:

images = [Image.open(f"image_{i}.jpg") for i in range(5)]
results, classifications = classifier.predict_batch(images)

for i, (result, classification) in enumerate(zip(results, classifications)):
    print(f"Image {i}: {classification} (confidence: {result[classification]:.3f})")

Current Capabilities

The model is currently optimized for three main cryptocurrency-related classification tasks:

Seed Phrase Detection: Identifies images containing cryptocurrency recovery/seed phrases or mnemonics
Crypto Address Detection: Recognizes cryptocurrency addresses (26-35 characters) and associated QR codes
Handwriting Detection: Detects handwritten text, particularly useful for identifying handwritten wallet information

Future Work

We have several exciting developments planned to enhance this model's efficacy for cryptocurrency-specific problemsets:

Model Distillation & Optimization

Domain-specific distillation: Create a smaller, faster model trained specifically on cryptocurrency-related imagery
Quantization improvements: Explore INT8 and mixed-precision quantization for even better performance
Hardware-specific optimizations: Optimize models for mobile devices and edge computing scenarios

Enhanced Crypto-Specific Features

Multi-language support: Extend seed phrase detection to support mnemonics in multiple languages
Blockchain-specific addressing: Improve detection for various blockchain address formats (Bitcoin, Ethereum, etc.)
Document structure analysis: Better understanding of wallet documents, exchange screenshots, and transaction receipts
Temporal analysis: Detect and analyze sequences of images for comprehensive wallet recovery scenarios

Training Data & Fine-tuning

Synthetic data generation: Create large-scale synthetic datasets of cryptocurrency-related imagery
Active learning pipeline: Implement continuous learning from user feedback and corrections
Cross-modal training: Incorporate OCR text extraction with visual understanding for better accuracy

Performance & Scalability

Real-time inference: Optimize for sub-100ms inference times on consumer hardware
Batch processing optimizations: Improve efficiency for large-scale image analysis tasks
Model compression: Achieve similar accuracy with significantly smaller model sizes

Integration & Deployment

REST API development: Create production-ready APIs for easy integration
Browser extension support: Enable direct use in web browsers for real-time analysis
Mobile SDKs: Develop native mobile libraries for iOS and Android applications

Model Architecture

Base Model: OpenAI CLIP ViT-B/32
Vision Encoder: Vision Transformer (ViT) with 32x32 patch size
Text Encoder: Transformer-based text encoder
Quantization: Q4F16 (4-bit weights, 16-bit activations)
Context Length: 77 tokens
Image Resolution: 224x224 pixels

License

This project is licensed under the MIT License, consistent with the original OpenAI CLIP model.

Original Model Licenses

OpenAI CLIP: MIT License - openai/CLIP
Xenova CLIP: MIT License - Xenova/clip-vit-base-patch32

The MIT License permits commercial use, modification, distribution, and private use. See the LICENSE file in the original OpenAI repository for full details.

Attribution

This work builds upon several excellent open-source projects:

OpenAI CLIP: The foundational model and research by Alec Radford, Jong Wook Kim, et al.
Xenova (Joshua): ONNX conversion and Transformers.js compatibility
Hugging Face: Model hosting and transformers library infrastructure
Microsoft ONNX Runtime: High-performance inference engine

Contributing

We welcome contributions to improve this cryptocurrency-specific CLIP implementation! Here's how you can help:

Ways to Contribute

Bug Reports: Found an issue? Please open a GitHub issue with detailed reproduction steps
Feature Requests: Have ideas for crypto-specific enhancements? We'd love to hear them
Code Contributions: Submit pull requests for bug fixes or new features
Dataset Contributions: Help us build better training data for cryptocurrency use cases
Documentation: Improve our documentation, examples, and tutorials

Development Setup

# Clone the repository
git clone https://github.com/yourusername/CAT-CLIP.git
cd CAT-CLIP

# Install dependencies
pip install -r requirements.txt

# Run tests
python -m pytest tests/

Contribution Guidelines

Follow PEP 8 style guidelines for Python code
Include tests for new functionality
Update documentation for any new features
Ensure compatibility with both CPU and GPU inference
Test changes across different image types and sizes

Code of Conduct

This project follows the Contributor Covenant Code of Conduct. Please be respectful and inclusive in all interactions.

Citation

If you use this model in your research or applications, please cite:

@misc{cat-clip-2024,
  title={CAT-CLIP: Cryptocurrency Analysis Tool - CLIP},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/CAT-CLIP}
}

@article{radford2021learning,
  title={Learning transferable visual models from natural language supervision},
  author={Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},
  journal={International conference on machine learning},
  year={2021}
}

Note: This is a specialized implementation intended for cryptocurrency-related image analysis. For general-purpose CLIP usage, consider using the original OpenAI CLIP or Xenova's implementation directly.

DaSword
/

CAT-CLIP