README.md · ankitkushwaha90/custom-gpt-from-scratch-safetensors at main

metadata

license: mit
datasets:
  - fka/awesome-chatgpt-prompts
language:
  - en
metrics:
  - character
base_model:
  - openai/gpt-oss-20b
new_version: openai/gpt-oss-20b
pipeline_tag: token-classification
library_name: fastai
tags:
  - code

🧠 Custom GPT from Scratch — Saved in Safetensors

This repository contains a minimal GPT-style Transformer built completely from scratch using PyTorch and integrated with the Hugging Face Trainer for easy training, evaluation, and saving. Unlike fine-tuning, this project does not start from a pre-trained model — the Transformer weights are initialized randomly and trained fully on a small custom dataset.

📂 Features

Custom GPT architecture — written in pure PyTorch
From scratch training — no pre-trained weights
Hugging Face Trainer integration for training loop, evaluation, and logging
Tokenizer compatibility — uses GPT2 tokenizer for convenience
Safetensors format — safe, portable model checkpointing
Tiny dataset — quick training for learning purposes

📜 How it Works

SimpleGPTConfig — stores model hyperparameters
CausalSelfAttention — implements causal masked multi-head self-attention
Block — Transformer block with LayerNorm, attention, and feed-forward network
SimpleGPTLMHeadModel — complete GPT model with language modeling head
Trainer setup — defines dataset, tokenizer, data collator, and training arguments
Training & saving — model is saved as model.safetensors

🚀 Getting Started

1️⃣ Install dependencies

pip install torch transformers datasets accelerate safetensors

2️⃣ Train the model

python train.py

This will train on a small text dataset and save the model to ./mini_custom_transformer_safetensors.

🗂 Repository Structure
├── train.py                   # Main training script
├── README.md                  # Project documentation
└── mini_custom_transformer_safetensors/
    ├── config.json
    ├── model.safetensors
    └── tokenizer.json

💡 Why Safetensors?

Security — avoids arbitrary code execution vulnerabilities in .bin files
Speed — faster loading on CPU and GPU
Interoperability — works with Hugging Face models out of the box

📌 Notes

This is a learning example, not intended for production-level performance.
Since it trains from scratch on a tiny dataset, output quality will be limited.
You can expand the dataset and train longer for better results.

📜 License

MIT License — feel free to use, modify, and share.

If you want, I can add an example inference script so users can load model.safetensors and generate text immediately after training. That way the README is complete for both training and usage.