YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
GPT-2 Medium Somali — Text Generation Only
Short description: A Somali text generation model based on GPT‑2 Medium (≈345M parameters). Optimized for general Somali generation: headlines, news‑style sentences, short stories, and assistant‑like completions. This repository is intended only for generation use cases.
Suggested model ID:
FatihJimale/gpt2-medium-somali
🔑 Key facts
- Architecture: GPT‑2 Medium (12 layers × 1024 hidden size × 16 heads, ~345M params)
- Objective: causal language modeling (next‑token prediction)
- Context length: 1024 tokens (set to your actual value if different)
- Tokenizer: GPT‑2 BPE (fast)
- Framework: 🤗 Transformers
- Precision: FP16/BF16 compatible at inference
✅ Intended use
- Somali text generation (stories, headlines, news‑style sentences, prompts)
- Assistant‑style completions in Somali
⚠️ Limitations
- May generate inaccurate, offensive, or biased content.
- Not suitable for factual QA without verification.
- Avoid safety‑critical usage.
🚀 Quick start (Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "FatihJimale/gpt2-medium-somali"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
prompt = "qarax xoogan ayaa ka dhacay magaalada"
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=80,
do_sample=True,
temperature=0.9,
top_p=0.92,
repetition_penalty=1.08,
)
print(tok.decode(outputs[0], skip_special_tokens=True))
Inference tips
- If repetition appears, increase
repetition_penaltyto 1.1–1.2 or lowertemperature(0.7–0.9). - For more focused generations, reduce
max_new_tokensand settop_paround 0.9. - Deterministic output:
do_sample=False, tunetop_k=None,temperature=1.0.
🔧 Model details
- Training steps: ≈14,850 (completed at ~epoch 2.00)
- Epochs: 2
- Effective batch size: 64
- Learning rate & schedule: final logged LR ≈ 8.998e-10
- Optimizer: AdamW (β1=0.9, β2=0.999)
- Weight decay: 0.01
- Mixed precision: bf16
- Hardware: AWS
ml.g5.24xlarge— 4× NVIDIA A10 (24 GB each), 96 vCPU, 384 GiB RAM; data-parallel across 4 GPUs - Context length: 1024 tokens
- Tokenizer: GPT‑2 BPE (fast) (no custom Somali tokenizer in this version)
- Train date: 2025‑09‑25
- Runtime: evaluation runtime ≈ 1652.22 s (~27.5 min); overall training wall‑clock ≈ 1.337 days (≈ 32 h 05 m 17 s)
Note: Dataset specifics and cleaning steps are intentionally not disclosed here, per the author's request. This card focuses on model size, parameters, and usage.
📊 Evaluation (please populate)
- Train loss (last logged): 1.8449 @ step 14850 (~epoch 2.00)
- Eval/validation loss: 1.78604
- Perplexity (valid/test): 5.9658 (final recorded value @ 2025‑09‑25 09:06:42)
- Eval runtime: 1652.22 s, 72.272 samples/s, 9.035 steps/s
- Human eval notes: TBD (fluency, coherence)
- Train loss: 1.8449
- Eval/validation loss: 1.78604
- Perplexity (valid/test): 5.9658
- Human eval notes: TBD (fluency, coherence)
📁 Repo layout
config.json
pytorch_model.bin (or model.safetensors)
merges.txt
vocab.json
tokenizer.json
tokenizer_config.json
special_tokens_map.json (if any)
README.md (this file)
Tusaale (Isticmaal Somali)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "FatihJimale/gpt2-medium-somali"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
prompt = "qarax xoogan ayaa ka dhacay magaalada"
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperature=0.9, top_p=0.92, repetition_penalty=1.08)
print(tok.decode(outputs[0], skip_special_tokens=True))
📣 Citation
@software{gpt2_medium_somali_2025,
title = {GPT-2 Medium Somali},
author = {Mohamed Abdirizak Ahmed},
year = {2025},
url = {https://huggingface.co/FatihJimale/gpt2-medium-somali}
}
🔐 Safety
This model can produce hallucinations and harmful content. Use with content filters and human review. Do not use for medical, legal, or financial advice.
- Downloads last month
- 28
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support