metadata
library_name: transformers
license: apache-2.0
datasets:
- PrimeIntellect/Reverse-Text-SFT
base_model:
- PrimeIntellect/Qwen3-0.6B
Qwen3-0.6B-Reverse-Text-SFT
A debug model fine-tuned on willcb/R1-reverse-wikipedia-paragraphs-v1-1000
. To be used as warmed up model to RL in vf-reverse-text
.
Created with this training command from prime-rl (commit hash: 8262560
)
uv run torchrun --nproc-per-node 8 src/prime_rl/trainer/sft/train.py \
--model.name PrimeIntellect/Qwen3-0.6B \
--data.name willcb/R1-reverse-wikipedia-paragraphs-v1-1000 \
--max-steps 100 \
--data.batch-size 16 \
--data.micro-batch-size 1 \
--data.seq-len 4096 \
--optim.lr 2e-5
Check the run out on W&B.