PotentSulfurLM 500K

What happens if you take the CinnabarLM idea and push it FURTHER? You'll get this!

PotentSulfurLM 500K is a tiny, 500K-parameter LLM trained for ~2 hours on a T4 GPU (on Colab, because we couldn't do all 10k steps, because that would take ~3 hours)! It's only 2.2 MB in size and it's Llama-based!

Why?

Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!

Model Configurations

Parameter Value
Tokenizer Llama 3's tokenizer (Tiktoken / BPE)
Vocabulary Size 2048 tokens
Batch Size 16 x 16 = 256
Context Window Maybe 2048 tokens
hidden_size 96
intermediate_size 96
num_hidden_layers 3
num_attention_heads 3
max_position_embeddings 2048
rms_norm_eps 1e-5
initializer_range 0.02
use_cache True
tie_word_embeddings False
rope_theta 10000.0

Training Configurations

Hyperparameter Value
output_dir "./cinnabarlm-v2"
max_steps 10000
per_device_train_batch_size 16
gradient_accumulation_steps 16
learning_rate 6e-4
weight_decay 0.01
warmup_steps 500
lr_scheduler_type "cosine"
logging_steps 100
save_steps 2000
fp16 True
save_total_limit 2
prediction_loss_only True
logging_first_step True

Limitations

  • Not Instruction-Tuned: It's only a base model, so it only completes text.
  • English-Only: It's trained on English data (FineWeb), it's NOT multilingual.

Some other details

  • It's trained on ~200 million tokens of FineWeb (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
  • The name "PotentSulfurLM" that I picked was made by combining "Potent Sulfur" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)
Downloads last month
47
Safetensors
Model size
587k params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train MihaiPopa-1/PotentSulfurLM-500K-Base

Spaces using MihaiPopa-1/PotentSulfurLM-500K-Base 2

Free AI Image Generator No sign-up. Instant results. Open Now