Nemotron-Cascade Collection Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models • 18 items • Updated 3 days ago • 41
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search Paper • 2510.12801 • Published Oct 14, 2025 • 13
SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities Paper • 2502.12025 • Published Feb 17, 2025 • 3
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 180
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated Aug 7, 2025 • 396