YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GPT-2 Medium Somali — Text Generation Only

Short description: A Somali text generation model based on GPT‑2 Medium (≈345M parameters). Optimized for general Somali generation: headlines, news‑style sentences, short stories, and assistant‑like completions. This repository is intended only for generation use cases.

Suggested model ID: FatihJimale/gpt2-medium-somali

🔑 Key facts

Architecture: GPT‑2 Medium (12 layers × 1024 hidden size × 16 heads, ~345M params)
Objective: causal language modeling (next‑token prediction)
Context length: 1024 tokens (set to your actual value if different)
Tokenizer: GPT‑2 BPE (fast)
Framework: 🤗 Transformers
Precision: FP16/BF16 compatible at inference

✅ Intended use

Somali text generation (stories, headlines, news‑style sentences, prompts)
Assistant‑style completions in Somali

⚠️ Limitations

May generate inaccurate, offensive, or biased content.
Not suitable for factual QA without verification.
Avoid safety‑critical usage.

🚀 Quick start (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "FatihJimale/gpt2-medium-somali"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = "qarax xoogan ayaa ka dhacay magaalada"
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=80,
    do_sample=True,
    temperature=0.9,
    top_p=0.92,
    repetition_penalty=1.08,
)
print(tok.decode(outputs[0], skip_special_tokens=True))

Inference tips

If repetition appears, increase repetition_penalty to 1.1–1.2 or lower temperature (0.7–0.9).
For more focused generations, reduce max_new_tokens and set top_p around 0.9.
Deterministic output: do_sample=False, tune top_k=None, temperature=1.0.

🔧 Model details

Training steps: ≈14,850 (completed at ~epoch 2.00)
Epochs: 2
Effective batch size: 64
Learning rate & schedule: final logged LR ≈ 8.998e-10
Optimizer: AdamW (β1=0.9, β2=0.999)
Weight decay: 0.01
Mixed precision: bf16
Hardware: AWS ml.g5.24xlarge — 4× NVIDIA A10 (24 GB each), 96 vCPU, 384 GiB RAM; data-parallel across 4 GPUs
Context length: 1024 tokens
Tokenizer: GPT‑2 BPE (fast) (no custom Somali tokenizer in this version)
Train date: 2025‑09‑25
Runtime: evaluation runtime ≈ 1652.22 s (~27.5 min); overall training wall‑clock ≈ 1.337 days (≈ 32 h 05 m 17 s)

Note: Dataset specifics and cleaning steps are intentionally not disclosed here, per the author's request. This card focuses on model size, parameters, and usage.

📊 Evaluation (please populate)

Train loss (last logged): 1.8449 @ step 14850 (~epoch 2.00)
Eval/validation loss: 1.78604
Perplexity (valid/test): 5.9658 (final recorded value @ 2025‑09‑25 09:06:42)
Eval runtime: 1652.22 s, 72.272 samples/s, 9.035 steps/s
Human eval notes: TBD (fluency, coherence)
Train loss: 1.8449
Eval/validation loss: 1.78604
Perplexity (valid/test): 5.9658
Human eval notes: TBD (fluency, coherence)

📁 Repo layout

config.json
pytorch_model.bin  (or model.safetensors)
merges.txt
vocab.json
tokenizer.json
tokenizer_config.json
special_tokens_map.json (if any)
README.md (this file)

Tusaale (Isticmaal Somali)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "FatihJimale/gpt2-medium-somali"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = "qarax xoogan ayaa ka dhacay magaalada"
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperature=0.9, top_p=0.92, repetition_penalty=1.08)
print(tok.decode(outputs[0], skip_special_tokens=True))

📣 Citation

@software{gpt2_medium_somali_2025,
  title        = {GPT-2 Medium Somali},
  author       = {Mohamed Abdirizak Ahmed},
  year         = {2025},
  url          = {https://huggingface.co/FatihJimale/gpt2-medium-somali}
}

🔐 Safety

This model can produce hallucinations and harmful content. Use with content filters and human review. Do not use for medical, legal, or financial advice.

Downloads last month: 28

Safetensors

Model size

0.4B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FatihJimale/gpt2-medium-somali

Quantizations

1 model