ankitkushwaha90's picture
Create README.md
601f8d4 verified
metadata
license: mit
datasets:
  - fka/awesome-chatgpt-prompts
language:
  - en
metrics:
  - character
base_model:
  - openai/gpt-oss-20b
new_version: openai/gpt-oss-20b
pipeline_tag: token-classification
library_name: fastai
tags:
  - code

🧠 Custom GPT from Scratch — Saved in Safetensors

This repository contains a minimal GPT-style Transformer built completely from scratch using PyTorch and integrated with the Hugging Face Trainer for easy training, evaluation, and saving. Unlike fine-tuning, this project does not start from a pre-trained model — the Transformer weights are initialized randomly and trained fully on a small custom dataset.

📂 Features

  • Custom GPT architecture — written in pure PyTorch

  • From scratch training — no pre-trained weights

  • Hugging Face Trainer integration for training loop, evaluation, and logging

  • Tokenizer compatibility — uses GPT2 tokenizer for convenience

  • Safetensors format — safe, portable model checkpointing

  • Tiny dataset — quick training for learning purposes

📜 How it Works

  • SimpleGPTConfig — stores model hyperparameters

  • CausalSelfAttention — implements causal masked multi-head self-attention

  • Block — Transformer block with LayerNorm, attention, and feed-forward network

  • SimpleGPTLMHeadModel — complete GPT model with language modeling head

  • Trainer setup — defines dataset, tokenizer, data collator, and training arguments

  • Training & saving — model is saved as model.safetensors

🚀 Getting Started

1️⃣ Install dependencies

pip install torch transformers datasets accelerate safetensors

2️⃣ Train the model

python train.py

This will train on a small text dataset and save the model to ./mini_custom_transformer_safetensors.

🗂 Repository Structure
├── train.py                   # Main training script
├── README.md                  # Project documentation
└── mini_custom_transformer_safetensors/
    ├── config.json
    ├── model.safetensors
    └── tokenizer.json

💡 Why Safetensors?

  • Security — avoids arbitrary code execution vulnerabilities in .bin files

  • Speed — faster loading on CPU and GPU

  • Interoperability — works with Hugging Face models out of the box

📌 Notes

  • This is a learning example, not intended for production-level performance.

  • Since it trains from scratch on a tiny dataset, output quality will be limited.

  • You can expand the dataset and train longer for better results.

📜 License

MIT License — feel free to use, modify, and share.

If you want, I can add an example inference script so users can load model.safetensors and generate text immediately after training. That way the README is complete for both training and usage.