license: mit
datasets:
- fka/awesome-chatgpt-prompts
language:
- en
metrics:
- character
base_model:
- openai/gpt-oss-20b
new_version: openai/gpt-oss-20b
pipeline_tag: token-classification
library_name: fastai
tags:
- code
🧠 Custom GPT from Scratch — Saved in Safetensors
This repository contains a minimal GPT-style Transformer built completely from scratch using PyTorch and integrated with the Hugging Face Trainer for easy training, evaluation, and saving. Unlike fine-tuning, this project does not start from a pre-trained model — the Transformer weights are initialized randomly and trained fully on a small custom dataset.
📂 Features
Custom GPT architecture — written in pure PyTorch
From scratch training — no pre-trained weights
Hugging Face Trainer integration for training loop, evaluation, and logging
Tokenizer compatibility — uses GPT2 tokenizer for convenience
Safetensors format — safe, portable model checkpointing
Tiny dataset — quick training for learning purposes
📜 How it Works
SimpleGPTConfig — stores model hyperparameters
CausalSelfAttention — implements causal masked multi-head self-attention
Block — Transformer block with LayerNorm, attention, and feed-forward network
SimpleGPTLMHeadModel — complete GPT model with language modeling head
Trainer setup — defines dataset, tokenizer, data collator, and training arguments
Training & saving — model is saved as model.safetensors
🚀 Getting Started
1️⃣ Install dependencies
pip install torch transformers datasets accelerate safetensors
2️⃣ Train the model
python train.py
This will train on a small text dataset and save the model to ./mini_custom_transformer_safetensors.
🗂 Repository Structure
├── train.py # Main training script
├── README.md # Project documentation
└── mini_custom_transformer_safetensors/
├── config.json
├── model.safetensors
└── tokenizer.json
💡 Why Safetensors?
Security — avoids arbitrary code execution vulnerabilities in .bin files
Speed — faster loading on CPU and GPU
Interoperability — works with Hugging Face models out of the box
📌 Notes
This is a learning example, not intended for production-level performance.
Since it trains from scratch on a tiny dataset, output quality will be limited.
You can expand the dataset and train longer for better results.
📜 License
MIT License — feel free to use, modify, and share.
If you want, I can add an example inference script so users can load model.safetensors and generate text immediately after training. That way the README is complete for both training and usage.