ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q4 Reinforcement Learning • 8B • Updated Mar 26 • 9.05k • 221
hkust-nlp/Qwen-2.5-7B-Verifier-R1-Verifier-1.5B Reinforcement Learning • 8B • Updated May 28 • 12 • 1