ankitkushwaha90 commited on
Commit
4df1e58
·
verified ·
1 Parent(s): 8f6154a

Create Generative_ai_transformers.md

Browse files
Files changed (1) hide show
  1. Generative_ai_transformers.md +22 -0
Generative_ai_transformers.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## 🔹 1. What your table shows
2
+
3
+ | Data Type | Transformer Type / Adaptation |
4
+ | ---------- | ----------------------------- |
5
+ | Text | GPT, T5, BERT |
6
+ | Image | ViT, ViT + generative decoder |
7
+ | Audio | SpeechT5, MusicLM |
8
+ | Video | TimeSformer, VideoMAE |
9
+ | Multimodal | CLIP, BLIP |
10
+
11
+ Observation:
12
+
13
+ Some of these are purely generative, some are discriminative, and some are both depending on usage.
14
+
15
+ ## 🔹 2. Which are generative?
16
+ | Data Type | Generative? | Notes |
17
+ | ---------- | ------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------- |
18
+ | Text | ✅ Yes (GPT, T5 in generation mode) | Can generate text sequences. BERT alone is mostly discriminative. |
19
+ | Image | ✅ Yes (ViT + decoder, Diffusion) | ViT itself is a **feature extractor**, generative decoder is needed to create images. |
20
+ | Audio | ✅ Yes (SpeechT5, MusicLM) | Can generate speech, TTS, music. |
21
+ | Video | ✅ Yes (VideoMAE + decoder, TimeSformer for generation) | Video generation requires decoder after transformer embeddings. |
22
+ | Multimodal | ✅ Yes (BLIP, CLIP used in generation pipeline) | CLIP itself is discriminative (alignment), but used with generative decoder (text-to-image, VQGAN + CLIP). |