A newer version of this model is available: openai/gpt-oss-20b

🧠 Custom GPT from Scratch — Saved in Safetensors

This repository contains a minimal GPT-style Transformer built completely from scratch using PyTorch and integrated with the Hugging Face Trainer for easy training, evaluation, and saving. Unlike fine-tuning, this project does not start from a pre-trained model — the Transformer weights are initialized randomly and trained fully on a small custom dataset.

📂 Features

  • Custom GPT architecture — written in pure PyTorch

  • From scratch training — no pre-trained weights

  • Hugging Face Trainer integration for training loop, evaluation, and logging

  • Tokenizer compatibility — uses GPT2 tokenizer for convenience

  • Safetensors format — safe, portable model checkpointing

  • Tiny dataset — quick training for learning purposes

📜 How it Works

  • SimpleGPTConfig — stores model hyperparameters

  • CausalSelfAttention — implements causal masked multi-head self-attention

  • Block — Transformer block with LayerNorm, attention, and feed-forward network

  • SimpleGPTLMHeadModel — complete GPT model with language modeling head

  • Trainer setup — defines dataset, tokenizer, data collator, and training arguments

  • Training & saving — model is saved as model.safetensors

🚀 Getting Started

1️⃣ Install dependencies

pip install torch transformers datasets accelerate safetensors

2️⃣ Train the model

python train.py

This will train on a small text dataset and save the model to ./mini_custom_transformer_safetensors.

🗂 Repository Structure
├── train.py                   # Main training script
├── README.md                  # Project documentation
└── mini_custom_transformer_safetensors/
    ├── config.json
    ├── model.safetensors
    └── tokenizer.json

💡 Why Safetensors?

  • Security — avoids arbitrary code execution vulnerabilities in .bin files

  • Speed — faster loading on CPU and GPU

  • Interoperability — works with Hugging Face models out of the box

📌 Notes

  • This is a learning example, not intended for production-level performance.

  • Since it trains from scratch on a tiny dataset, output quality will be limited.

  • You can expand the dataset and train longer for better results.

📜 License

MIT License — feel free to use, modify, and share.

If you want, I can add an example inference script so users can load model.safetensors and generate text immediately after training. That way the README is complete for both training and usage.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ankitkushwaha90/custom-gpt-from-scratch-safetensors

Base model

openai/gpt-oss-20b
Finetuned
(193)
this model

Dataset used to train ankitkushwaha90/custom-gpt-from-scratch-safetensors