DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning Paper • 2602.11089 • Published about 12 hours ago • 9
CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion Paper • 2602.10999 • Published about 13 hours ago • 7
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning Paper • 2602.10560 • Published about 23 hours ago • 10
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published 1 day ago • 174
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning Paper • 2602.08234 • Published 3 days ago • 58
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 10 days ago • 125
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 13 days ago • 150
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Paper • 2601.21558 • Published 14 days ago • 58
daVinci-Dev: Agent-native Mid-training for Software Engineering Paper • 2601.18418 • Published 17 days ago • 124
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model Paper • 2601.15892 • Published 21 days ago • 53
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published 28 days ago • 126
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published Jan 9 • 52
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis Paper • 2601.05808 • Published Jan 9 • 36
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published Dec 29, 2025 • 98