ankitkushwaha90 commited on
Commit
601f8d4
·
verified ·
1 Parent(s): 5464646

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - fka/awesome-chatgpt-prompts
5
+ language:
6
+ - en
7
+ metrics:
8
+ - character
9
+ base_model:
10
+ - openai/gpt-oss-20b
11
+ new_version: openai/gpt-oss-20b
12
+ pipeline_tag: token-classification
13
+ library_name: fastai
14
+ tags:
15
+ - code
16
+ ---
17
+
18
+ ## 🧠 Custom GPT from Scratch — Saved in Safetensors
19
+
20
+ This repository contains a minimal GPT-style Transformer built completely from scratch using PyTorch and integrated with the Hugging Face Trainer for easy training, evaluation, and saving.
21
+ Unlike fine-tuning, this project does not start from a pre-trained model — the Transformer weights are initialized randomly and trained fully on a small custom dataset.
22
+
23
+ ## 📂 Features
24
+
25
+ - Custom GPT architecture — written in pure PyTorch
26
+
27
+ - From scratch training — no pre-trained weights
28
+
29
+ - Hugging Face Trainer integration for training loop, evaluation, and logging
30
+
31
+ - Tokenizer compatibility — uses GPT2 tokenizer for convenience
32
+
33
+ - Safetensors format — safe, portable model checkpointing
34
+
35
+ - Tiny dataset — quick training for learning purposes
36
+
37
+ ## 📜 How it Works
38
+
39
+ - SimpleGPTConfig — stores model hyperparameters
40
+
41
+ - CausalSelfAttention — implements causal masked multi-head self-attention
42
+
43
+ - Block — Transformer block with LayerNorm, attention, and feed-forward network
44
+
45
+ - SimpleGPTLMHeadModel — complete GPT model with language modeling head
46
+
47
+ - Trainer setup — defines dataset, tokenizer, data collator, and training arguments
48
+
49
+ - Training & saving — model is saved as model.safetensors
50
+
51
+ ## 🚀 Getting Started
52
+ 1️⃣ Install dependencies
53
+ ```bash
54
+ pip install torch transformers datasets accelerate safetensors
55
+ ```
56
+
57
+ 2️⃣ Train the model
58
+ ```bash
59
+ python train.py
60
+ ```
61
+
62
+ This will train on a small text dataset and save the model to ./mini_custom_transformer_safetensors.
63
+ ```bash
64
+ 🗂 Repository Structure
65
+ ├── train.py # Main training script
66
+ ├── README.md # Project documentation
67
+ └── mini_custom_transformer_safetensors/
68
+ ├── config.json
69
+ ├── model.safetensors
70
+ └── tokenizer.json
71
+ ```
72
+ ## 💡 Why Safetensors?
73
+
74
+ - Security — avoids arbitrary code execution vulnerabilities in .bin files
75
+
76
+ - Speed — faster loading on CPU and GPU
77
+
78
+ - Interoperability — works with Hugging Face models out of the box
79
+
80
+ ## 📌 Notes
81
+
82
+ - This is a learning example, not intended for production-level performance.
83
+
84
+ - Since it trains from scratch on a tiny dataset, output quality will be limited.
85
+
86
+ - You can expand the dataset and train longer for better results.
87
+
88
+ ## 📜 License
89
+
90
+ MIT License — feel free to use, modify, and share.
91
+
92
+ If you want, I can add an example inference script so users can load model.safetensors and generate text immediately after training. That way the README is complete for both training and usage.