PRIME-RL
Collection
6 items
•
Updated
A debug model fine-tuned on willcb/R1-reverse-wikipedia-paragraphs-v1-1000
. To be used as warmed up model to RL in vf-reverse-text
.
Created with this training command from prime-rl (commit hash: 8262560
)
uv run torchrun --nproc-per-node 8 src/prime_rl/trainer/sft/train.py \
--model.name PrimeIntellect/Qwen3-0.6B \
--data.name willcb/R1-reverse-wikipedia-paragraphs-v1-1000 \
--max-steps 100 \
--data.batch-size 16 \
--data.micro-batch-size 1 \
--data.seq-len 4096 \
--optim.lr 2e-5
Check the run out on W&B.