|
--- |
|
license: mit |
|
datasets: |
|
- fka/awesome-chatgpt-prompts |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- Qwen/Qwen-Image |
|
new_version: openai/gpt-oss-120b |
|
pipeline_tag: token-classification |
|
library_name: fastai |
|
tags: |
|
- code |
|
--- |
|
# Using a Trained Mini-GPT Model (Safetensors) |
|
|
|
This guide explains how to **load a trained Mini-GPT model** saved in `safetensors` format and generate text using it. |
|
It is written in a step-by-step manner for **learning and understanding**. |
|
|
|
--- |
|
|
|
## 1️⃣ Install Required Packages |
|
|
|
Make sure you have the necessary packages: |
|
|
|
```bash |
|
pip install torch transformers safetensors |
|
``` |
|
## 2️⃣ Load the Trained Model and Tokenizer |
|
We saved our model earlier in ./mini_gpt_safetensor. |
|
Here’s how to load it: |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_path = "./mini_gpt_safetensor" # Path to your saved model |
|
|
|
# Load tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
tokenizer.pad_token = tokenizer.eos_token # GPT models don't have pad_token |
|
|
|
# Load model |
|
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto") # auto uses GPU if available |
|
``` |
|
Note: Using device_map="auto" will load the model on GPU if available, otherwise CPU. |
|
|
|
## 3️⃣ Generate Text from a Prompt |
|
Once the model is loaded, we can generate text using a simple function: |
|
|
|
```python |
|
def generate_text(prompt, max_length=50): |
|
# Tokenize prompt |
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids |
|
input_ids = input_ids.to(model.device) |
|
|
|
# Generate text |
|
output_ids = model.generate( |
|
input_ids, |
|
max_length=max_length, |
|
do_sample=True, # enable randomness |
|
top_k=50, # sample from top 50 tokens |
|
top_p=0.95, # nucleus sampling |
|
temperature=0.7, # creativity factor |
|
num_return_sequences=1 |
|
) |
|
|
|
# Decode output |
|
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True) |
|
return output_text |
|
``` |
|
Tip: |
|
|
|
- do_sample=True → random outputs for creativity |
|
|
|
- top_k and top_p → control sampling probability |
|
|
|
- temperature → higher value = more creative output |
|
|
|
## 4️⃣ Test Text Generation |
|
Use your function with any prompt: |
|
|
|
```python |
|
prompt = "Hello, I am training a mini GPT model" |
|
generated_text = generate_text(prompt, max_length=50) |
|
|
|
print("\n📝 Generated text:") |
|
print(generated_text) |
|
``` |
|
Example output: |
|
|
|
```css |
|
Hello, I am training a mini GPT model to generate simple sentences about Python, deep learning, and AI projects. |
|
``` |
|
✅ Summary |
|
- Load the tokenizer and model from the safetensors folder. |
|
|
|
- Use generate with proper sampling parameters for creative text. |
|
|
|
- Decode the output to get readable text. |
|
|
|
- You can experiment with prompt, max_length, top_k, top_p, and temperature to control text generation. |
|
|
|
- By following this MDX guide, you can easily load any trained Mini-GPT model and generate text interactively. |