🧠 Custom GPT from Scratch — Saved in Safetensors
This repository contains a minimal GPT-style Transformer built completely from scratch using PyTorch and integrated with the Hugging Face Trainer for easy training, evaluation, and saving. Unlike fine-tuning, this project does not start from a pre-trained model — the Transformer weights are initialized randomly and trained fully on a small custom dataset.
📂 Features
Custom GPT architecture — written in pure PyTorch
From scratch training — no pre-trained weights
Hugging Face Trainer integration for training loop, evaluation, and logging
Tokenizer compatibility — uses GPT2 tokenizer for convenience
Safetensors format — safe, portable model checkpointing
Tiny dataset — quick training for learning purposes
📜 How it Works
SimpleGPTConfig — stores model hyperparameters
CausalSelfAttention — implements causal masked multi-head self-attention
Block — Transformer block with LayerNorm, attention, and feed-forward network
SimpleGPTLMHeadModel — complete GPT model with language modeling head
Trainer setup — defines dataset, tokenizer, data collator, and training arguments
Training & saving — model is saved as model.safetensors
🚀 Getting Started
1️⃣ Install dependencies
pip install torch transformers datasets accelerate safetensors
2️⃣ Train the model
python train.py
This will train on a small text dataset and save the model to ./mini_custom_transformer_safetensors.
🗂 Repository Structure
├── train.py # Main training script
├── README.md # Project documentation
└── mini_custom_transformer_safetensors/
├── config.json
├── model.safetensors
└── tokenizer.json
💡 Why Safetensors?
Security — avoids arbitrary code execution vulnerabilities in .bin files
Speed — faster loading on CPU and GPU
Interoperability — works with Hugging Face models out of the box
📌 Notes
This is a learning example, not intended for production-level performance.
Since it trains from scratch on a tiny dataset, output quality will be limited.
You can expand the dataset and train longer for better results.
📜 License
MIT License — feel free to use, modify, and share.
If you want, I can add an example inference script so users can load model.safetensors and generate text immediately after training. That way the README is complete for both training and usage.
Model tree for ankitkushwaha90/custom-gpt-from-scratch-safetensors
Base model
openai/gpt-oss-20b