ankitkushwaha90
/

custom-gpt-from-scratch-safetensors

Token Classification

Model card Files Files and versions

ankitkushwaha90 commited on 12 days ago

Commit

601f8d4

·

verified ·

1 Parent(s): 5464646

Create README.md

Files changed (1) hide show

README.md +92 -0

README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: mit
+datasets:
+- fka/awesome-chatgpt-prompts
+language:
+- en
+metrics:
+- character
+base_model:
+- openai/gpt-oss-20b
+new_version: openai/gpt-oss-20b
+pipeline_tag: token-classification
+library_name: fastai
+tags:
+- code
+---
+## 🧠 Custom GPT from Scratch — Saved in Safetensors
+This repository contains a minimal GPT-style Transformer built completely from scratch using PyTorch and integrated with the Hugging Face Trainer for easy training, evaluation, and saving.
+Unlike fine-tuning, this project does not start from a pre-trained model — the Transformer weights are initialized randomly and trained fully on a small custom dataset.
+## 📂 Features
+- Custom GPT architecture — written in pure PyTorch
+- From scratch training — no pre-trained weights
+- Hugging Face Trainer integration for training loop, evaluation, and logging
+- Tokenizer compatibility — uses GPT2 tokenizer for convenience
+- Safetensors format — safe, portable model checkpointing
+- Tiny dataset — quick training for learning purposes
+## 📜 How it Works
+- SimpleGPTConfig — stores model hyperparameters
+- CausalSelfAttention — implements causal masked multi-head self-attention
+- Block — Transformer block with LayerNorm, attention, and feed-forward network
+- SimpleGPTLMHeadModel — complete GPT model with language modeling head
+- Trainer setup — defines dataset, tokenizer, data collator, and training arguments
+- Training & saving — model is saved as model.safetensors
+## 🚀 Getting Started
+1️⃣ Install dependencies
+```bash
+pip install torch transformers datasets accelerate safetensors
+```
+2️⃣ Train the model
+```bash
+python train.py
+```
+This will train on a small text dataset and save the model to ./mini_custom_transformer_safetensors.
+```bash
+🗂 Repository Structure
+├── train.py                   # Main training script
+├── README.md                  # Project documentation
+└── mini_custom_transformer_safetensors/
+    ├── config.json
+    ├── model.safetensors
+    └── tokenizer.json
+```
+## 💡 Why Safetensors?
+- Security — avoids arbitrary code execution vulnerabilities in .bin files
+- Speed — faster loading on CPU and GPU
+- Interoperability — works with Hugging Face models out of the box
+## 📌 Notes
+- This is a learning example, not intended for production-level performance.
+- Since it trains from scratch on a tiny dataset, output quality will be limited.
+- You can expand the dataset and train longer for better results.
+## 📜 License
+MIT License — feel free to use, modify, and share.
+If you want, I can add an example inference script so users can load model.safetensors and generate text immediately after training. That way the README is complete for both training and usage.