OpenCLIP
PyTorch
clip
vaishaal commited on
Commit
a048182
·
1 Parent(s): 938f3dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -0
README.md CHANGED
@@ -3,3 +3,112 @@ license: other
3
  license_name: apple-sample-code-license
4
  license_link: LICENSE
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  license_name: apple-sample-code-license
4
  license_link: LICENSE
5
  ---
6
+ ---
7
+ license: other
8
+ license_name: apple-sample-code-license
9
+ license_link: LICENSE
10
+ ---
11
+ A CLIP (Contrastive Language-Image Pre-training) model trained on DFN-2B.
12
+ Data Filtering Networks (DFNs) are small used to automatically filter large pools of uncurated data.
13
+ This model was trained on 2B images that were filtered from a pool of 12.8B uncurated image-text pairs
14
+ (12.8B image-text pairs from CommonPool-12.8B).
15
+
16
+ This model has been converted to PyTorch from the original JAX checkpoints from Axlearn (https://github.com/apple/axlearn).
17
+ These weights are directly usable in OpenCLIP (image + text).
18
+
19
+
20
+ ## Model Details
21
+
22
+ - **Model Type:** Contrastive Image-Text, Zero-Shot Image Classification.
23
+ - **Dataset:** DFN-2b
24
+ - **Papers:**
25
+ - Data Filtering Networks: https://arxiv.org/abs/2309.17425
26
+ - **Examples Seen:** 12.8B
27
+
28
+
29
+ ## Model Metrics
30
+ | dataset | metric |
31
+ |:-----------------------|---------:|
32
+ | ImageNet 1k | 0.81396 |
33
+ | Caltech-101 | 0.953141 |
34
+ | CIFAR-10 | 0.9836 |
35
+ | CIFAR-100 | 0.8835 |
36
+ | CLEVR Counts | 0.3338 |
37
+ | CLEVR Distance | 0.248733 |
38
+ | Country211 | 0.28237 |
39
+ | Describable Textures | 0.66117 |
40
+ | EuroSAT | 0.646296 |
41
+ | FGVC Aircraft | 0.395945 |
42
+ | Food-101 | 0.945861 |
43
+ | GTSRB | 0.616152 |
44
+ | ImageNet Sketch | 0.683311 |
45
+ | ImageNet v2 | 0.7453 |
46
+ | ImageNet-A | 0.6676 |
47
+ | ImageNet-O | 0.3915 |
48
+ | ImageNet-R | 0.900033 |
49
+ | KITTI Vehicle Distance | 0.201125 |
50
+ | MNIST | 0.8468 |
51
+ | ObjectNet | 0.739367 |
52
+ | Oxford Flowers-102 | 0.865822 |
53
+ | Oxford-IIIT Pet | 0.954941 |
54
+ | Pascal VOC 2007 | 0.81644 |
55
+ | PatchCamelyon | 0.63028 |
56
+ | Rendered SST2 | 0.551345 |
57
+ | RESISC45 | 0.733175 |
58
+ | Stanford Cars | 0.947146 |
59
+ | STL-10 | 0.976625 |
60
+ | SUN397 | 0.754565 |
61
+ | SVHN | 0.653503 |
62
+ | Flickr | 0.8244 |
63
+ | MSCOCO | 0.570363 |
64
+ | WinoGAViL | 0.551645 |
65
+ | iWildCam | 0.18877 |
66
+ | Camelyon17 | 0.626179 |
67
+ | FMoW | 0.222137 |
68
+ | Dollar Street | 0.688084 |
69
+ | GeoDE | 0.91023 |
70
+ | **Average** | **0.668558** |
71
+
72
+ ## Model Usage
73
+ ### With OpenCLIP
74
+ ```
75
+ import torch
76
+ import torch.nn.functional as F
77
+ from urllib.request import urlopen
78
+ from PIL import Image
79
+ from open_clip import create_model_from_pretrained, get_tokenizer
80
+
81
+ model, preprocess = create_model_from_pretrained('hf-hub:apple/DFN2B-CLIP-ViT-L-14')
82
+ tokenizer = get_tokenizer('ViT-L-14')
83
+
84
+ image = Image.open(urlopen(
85
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
86
+ ))
87
+ image = preprocess(image).unsqueeze(0)
88
+
89
+ labels_list = ["a dog", "a cat", "a donut", "a beignet"]
90
+ text = tokenizer(labels_list, context_length=model.context_length)
91
+
92
+ with torch.no_grad(), torch.cuda.amp.autocast():
93
+ image_features = model.encode_image(image)
94
+ text_features = model.encode_text(text)
95
+ image_features = F.normalize(image_features, dim=-1)
96
+ text_features = F.normalize(text_features, dim=-1)
97
+
98
+ text_probs = torch.sigmoid(image_features @ text_features.T * model.logit_scale.exp() + model.logit_bias)
99
+
100
+ zipped_list = list(zip(labels_list, [round(p.item(), 3) for p in text_probs[0]]))
101
+ print("Label probabilities: ", zipped_list)
102
+ ```
103
+
104
+ ## Citation
105
+ ```bibtex
106
+ @article{fang2023data,
107
+ title={Data Filtering Networks},
108
+ author={Fang, Alex and Jose, Albin Madappally and Jain, Amit and Schmidt, Ludwig and Toshev, Alexander and Shankar, Vaishaal},
109
+ journal={arXiv preprint arXiv:2309.17425},
110
+ year={2023}
111
+ }
112
+
113
+ ```
114
+