Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- fka/awesome-chatgpt-prompts
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
metrics:
|
8 |
+
- character
|
9 |
+
base_model:
|
10 |
+
- openai/gpt-oss-20b
|
11 |
+
new_version: openai/gpt-oss-20b
|
12 |
+
pipeline_tag: token-classification
|
13 |
+
library_name: fastai
|
14 |
+
tags:
|
15 |
+
- code
|
16 |
+
---
|
17 |
+
|
18 |
+
## 🧠 Custom GPT from Scratch — Saved in Safetensors
|
19 |
+
|
20 |
+
This repository contains a minimal GPT-style Transformer built completely from scratch using PyTorch and integrated with the Hugging Face Trainer for easy training, evaluation, and saving.
|
21 |
+
Unlike fine-tuning, this project does not start from a pre-trained model — the Transformer weights are initialized randomly and trained fully on a small custom dataset.
|
22 |
+
|
23 |
+
## 📂 Features
|
24 |
+
|
25 |
+
- Custom GPT architecture — written in pure PyTorch
|
26 |
+
|
27 |
+
- From scratch training — no pre-trained weights
|
28 |
+
|
29 |
+
- Hugging Face Trainer integration for training loop, evaluation, and logging
|
30 |
+
|
31 |
+
- Tokenizer compatibility — uses GPT2 tokenizer for convenience
|
32 |
+
|
33 |
+
- Safetensors format — safe, portable model checkpointing
|
34 |
+
|
35 |
+
- Tiny dataset — quick training for learning purposes
|
36 |
+
|
37 |
+
## 📜 How it Works
|
38 |
+
|
39 |
+
- SimpleGPTConfig — stores model hyperparameters
|
40 |
+
|
41 |
+
- CausalSelfAttention — implements causal masked multi-head self-attention
|
42 |
+
|
43 |
+
- Block — Transformer block with LayerNorm, attention, and feed-forward network
|
44 |
+
|
45 |
+
- SimpleGPTLMHeadModel — complete GPT model with language modeling head
|
46 |
+
|
47 |
+
- Trainer setup — defines dataset, tokenizer, data collator, and training arguments
|
48 |
+
|
49 |
+
- Training & saving — model is saved as model.safetensors
|
50 |
+
|
51 |
+
## 🚀 Getting Started
|
52 |
+
1️⃣ Install dependencies
|
53 |
+
```bash
|
54 |
+
pip install torch transformers datasets accelerate safetensors
|
55 |
+
```
|
56 |
+
|
57 |
+
2️⃣ Train the model
|
58 |
+
```bash
|
59 |
+
python train.py
|
60 |
+
```
|
61 |
+
|
62 |
+
This will train on a small text dataset and save the model to ./mini_custom_transformer_safetensors.
|
63 |
+
```bash
|
64 |
+
🗂 Repository Structure
|
65 |
+
├── train.py # Main training script
|
66 |
+
├── README.md # Project documentation
|
67 |
+
└── mini_custom_transformer_safetensors/
|
68 |
+
├── config.json
|
69 |
+
├── model.safetensors
|
70 |
+
└── tokenizer.json
|
71 |
+
```
|
72 |
+
## 💡 Why Safetensors?
|
73 |
+
|
74 |
+
- Security — avoids arbitrary code execution vulnerabilities in .bin files
|
75 |
+
|
76 |
+
- Speed — faster loading on CPU and GPU
|
77 |
+
|
78 |
+
- Interoperability — works with Hugging Face models out of the box
|
79 |
+
|
80 |
+
## 📌 Notes
|
81 |
+
|
82 |
+
- This is a learning example, not intended for production-level performance.
|
83 |
+
|
84 |
+
- Since it trains from scratch on a tiny dataset, output quality will be limited.
|
85 |
+
|
86 |
+
- You can expand the dataset and train longer for better results.
|
87 |
+
|
88 |
+
## 📜 License
|
89 |
+
|
90 |
+
MIT License — feel free to use, modify, and share.
|
91 |
+
|
92 |
+
If you want, I can add an example inference script so users can load model.safetensors and generate text immediately after training. That way the README is complete for both training and usage.
|