Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models Paper • 2508.00819 • Published 23 days ago • 62
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training Paper • 2508.00414 • Published 24 days ago • 87
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution Paper • 2507.23348 • Published 25 days ago • 10
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14 • 85
An Empirical Study of Using Large Language Models for Unit Test Generation Paper • 2305.00418 • Published Apr 30, 2023 • 2
TESTEVAL: Benchmarking Large Language Models for Test Case Generation Paper • 2406.04531 • Published Jun 6, 2024 • 1
TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models Paper • 2409.17561 • Published Sep 26, 2024 • 1
ProjectTest: A Project-level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms Paper • 2502.06556 • Published Feb 10 • 2
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark Paper • 2410.00752 • Published Oct 1, 2024 • 1
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7 • 59
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs Paper • 2507.09477 • Published Jul 13 • 80
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published Jul 1 • 74