DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution Paper • 2601.13761 • Published 5 days ago • 15
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2512.20848 • Published Dec 23, 2025 • 35
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published Dec 18, 2025 • 94
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning Paper • 2510.27623 • Published Oct 31, 2025 • 13
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning Paper • 2510.24320 • Published Oct 28, 2025 • 21 • 3
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 40
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance Paper • 2502.12459 • Published Feb 18, 2025 • 2
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 40
LaSeR Collection Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding" • 5 items • Updated Oct 17, 2025 • 1
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 40 • 2
LaSeR Collection Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding" • 5 items • Updated Oct 17, 2025 • 1