Create all-types-of-transformers.md
Browse files- all-types-of-transformers.md +38 -0
all-types-of-transformers.md
ADDED
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Perfect ✅ Here’s a ready-to-use table of Hugging Face Transformers models for different data types, with their exact model IDs you can load directly.
|
2 |
+
|
3 |
+
## 🔹 Hugging Face Transformer Models by Data Type
|
4 |
+
| Data Type | Model Name | Hugging Face ID |
|
5 |
+
| ----------------------------------- | --------------------------- | --------------------------------------------- |
|
6 |
+
| **Text (NLP)** | BERT | `bert-base-uncased` |
|
7 |
+
| | GPT-2 | `gpt2` |
|
8 |
+
| | T5 | `t5-small` |
|
9 |
+
| | DistilBERT | `distilbert-base-uncased` |
|
10 |
+
| **Image (Vision)** | Vision Transformer (ViT) | `google/vit-base-patch16-224` |
|
11 |
+
| | DeiT (Data-efficient ViT) | `facebook/deit-base-distilled-patch16-224` |
|
12 |
+
| | BEiT | `microsoft/beit-base-patch16-224-pt22k-ft22k` |
|
13 |
+
| | Swin Transformer | `microsoft/swin-tiny-patch4-window7-224` |
|
14 |
+
| **Audio / Speech** | Wav2Vec2 (ASR) | `facebook/wav2vec2-base-960h` |
|
15 |
+
| | Whisper (ASR + Translation) | `openai/whisper-small` |
|
16 |
+
| | HuBERT (speech features) | `facebook/hubert-base-ls960` |
|
17 |
+
| | SpeechT5 (TTS + STT) | `microsoft/speecht5_asr` |
|
18 |
+
| **Video** | TimeSformer | `facebook/timesformer-base-finetuned-k400` |
|
19 |
+
| | VideoMAE | `MCG-NJU/videomae-base` |
|
20 |
+
| **Multimodal (Text + Image/Video)** | CLIP | `openai/clip-vit-base-patch32` |
|
21 |
+
| | BLIP (captioning, VQA) | `Salesforce/blip-image-captioning-base` |
|
22 |
+
| | ViLT (vision-language) | `dandelin/vilt-b32-finetuned-coco` |
|
23 |
+
|
24 |
+
🔹 Example Usage (generic)
|
25 |
+
```python
|
26 |
+
from transformers import AutoProcessor, AutoModel, AutoTokenizer
|
27 |
+
|
28 |
+
# Example: Load a vision transformer
|
29 |
+
model_id = "google/vit-base-patch16-224"
|
30 |
+
model = AutoModel.from_pretrained(model_id)
|
31 |
+
processor = AutoProcessor.from_pretrained(model_id)
|
32 |
+
|
33 |
+
print("Loaded:", model_id)
|
34 |
+
```
|
35 |
+
|
36 |
+
✅ This table should save you time: just copy the model ID and plug it into AutoModel / AutoProcessor / AutoTokenizer depending on the task.
|
37 |
+
|
38 |
+
👉 Do you want me to also write a full script that auto-detects the data type (text, image, audio, video) and loads the right Hugging Face model? That way, you can just pass "text", "image", "audio", "video" and it works automatically.
|