Feature Extraction
Transformers
Safetensors
clip
zero-shot-image-classification
megaelius's picture
Add library name and pipeline tag (#1)
f71d132 verified
metadata
base_model:
  - openai/clip-vit-large-patch14
datasets:
  - ILSVRC/imagenet-1k
  - mlfoundations/datacomp_small
license: mit
library_name: transformers
pipeline_tag: feature-extraction

[Paper]   [Code]

Model Initialized from openai/clip-vit-large-patch14. The image encoder is finetuned with FARE at $\epsilon=2/255$. The text encoder is finetuned with LEAF at $k=1$ with $\rho=50$ and semantic constraints.

To load this model use:

from transformers import CLIPProcessor, CLIPModel

model_name = "LEAF-CLIP/CLIP-ViT-L-rho50-k1-FARE2"
processor_name = "openai/clip-vit-large-patch14"

model = CLIPModel.from_pretrained(model_name)
processor = CLIPProcessor.from_pretrained(processor_name)