Rethinking Global Text Conditioning in Diffusion Transformers Paper • 2602.09268 • Published 24 days ago • 8
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation Paper • 2601.22813 • Published Jan 30 • 57
MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models Paper • 2601.11969 • Published Jan 17 • 27
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published Feb 3, 2025 • 21
Alchemist: Turning Public Text-to-Image Data into Generative Gold Paper • 2505.19297 • Published May 25, 2025 • 84
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20, 2025 • 78
Learning Adaptive Parallel Reasoning with Language Models Paper • 2504.15466 • Published Apr 21, 2025 • 44
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published Apr 7, 2025 • 139
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published Apr 8, 2025 • 110 • 6
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8, 2025 • 99