On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting Paper • 2508.11408 • Published 10 days ago • 6
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published 5 days ago • 29
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents Paper • 2508.13186 • Published 11 days ago • 16
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published Jul 22 • 62
AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training Paper • 2507.01663 • Published Jul 2 • 5
Kimina Prover Preview Collection State-of-the-Art Models for Formal Mathematical Reasoning • 5 items • Updated Apr 28 • 33