Create Generative_ai_transformers.md
Browse files
Generative_ai_transformers.md
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## 🔹 1. What your table shows
|
2 |
+
|
3 |
+
| Data Type | Transformer Type / Adaptation |
|
4 |
+
| ---------- | ----------------------------- |
|
5 |
+
| Text | GPT, T5, BERT |
|
6 |
+
| Image | ViT, ViT + generative decoder |
|
7 |
+
| Audio | SpeechT5, MusicLM |
|
8 |
+
| Video | TimeSformer, VideoMAE |
|
9 |
+
| Multimodal | CLIP, BLIP |
|
10 |
+
|
11 |
+
Observation:
|
12 |
+
|
13 |
+
Some of these are purely generative, some are discriminative, and some are both depending on usage.
|
14 |
+
|
15 |
+
## 🔹 2. Which are generative?
|
16 |
+
| Data Type | Generative? | Notes |
|
17 |
+
| ---------- | ------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------- |
|
18 |
+
| Text | ✅ Yes (GPT, T5 in generation mode) | Can generate text sequences. BERT alone is mostly discriminative. |
|
19 |
+
| Image | ✅ Yes (ViT + decoder, Diffusion) | ViT itself is a **feature extractor**, generative decoder is needed to create images. |
|
20 |
+
| Audio | ✅ Yes (SpeechT5, MusicLM) | Can generate speech, TTS, music. |
|
21 |
+
| Video | ✅ Yes (VideoMAE + decoder, TimeSformer for generation) | Video generation requires decoder after transformer embeddings. |
|
22 |
+
| Multimodal | ✅ Yes (BLIP, CLIP used in generation pipeline) | CLIP itself is discriminative (alignment), but used with generative decoder (text-to-image, VQGAN + CLIP). |
|