HuggingFaceFW/fineweb
Viewer • Updated • 52.5B • 1.05M • 2.84k
What happens if you take the CinnabarLM idea and push it FURTHER? You'll get this!
PotentSulfurLM 500K is a tiny, 500K-parameter LLM trained for ~2 hours on a T4 GPU (on Colab, because we couldn't do all 10k steps, because that would take ~3 hours)! It's only 2.2 MB in size and it's Llama-based!
Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!
| Parameter | Value |
|---|---|
| Tokenizer | Llama 3's tokenizer (Tiktoken / BPE) |
| Vocabulary Size | 2048 tokens |
| Batch Size | 16 x 16 = 256 |
| Context Window | Maybe 2048 tokens |
hidden_size |
96 |
intermediate_size |
96 |
num_hidden_layers |
3 |
num_attention_heads |
3 |
max_position_embeddings |
2048 |
rms_norm_eps |
1e-5 |
initializer_range |
0.02 |
use_cache |
True |
tie_word_embeddings |
False |
rope_theta |
10000.0 |
| Hyperparameter | Value |
|---|---|
output_dir |
"./cinnabarlm-v2" |
max_steps |
10000 |
per_device_train_batch_size |
16 |
gradient_accumulation_steps |
16 |
learning_rate |
6e-4 |
weight_decay |
0.01 |
warmup_steps |
500 |
lr_scheduler_type |
"cosine" |
logging_steps |
100 |
save_steps |
2000 |
fp16 |
True |
save_total_limit |
2 |
prediction_loss_only |
True |
logging_first_step |
True |