Seongyun/DeepSeek-R1-Distill-Qwen-1.5B-GRPO_mcqa_repetition_penalty_2 Text Generation • 2B • Updated Mar 8, 2025 • 7
Seongyun/DeepSeek-R1-Distill-Qwen-1.5B-GRPO_pref_repetition_penalty Text Generation • 2B • Updated Mar 1, 2025 • 3