Revisiting the Shape Convention of Transformer Language Models Paper • 2602.06471 • Published 3 days ago • 3
Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR Paper • 2602.05261 • Published 5 days ago • 47
Horizon-LM: A RAM-Centric Architecture for LLM Training Paper • 2602.04816 • Published 5 days ago • 16
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation Paper • 2601.22813 • Published 10 days ago • 55
Linear representations in language models can change dramatically over a conversation Paper • 2601.20834 • Published 12 days ago • 21
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published 12 days ago • 98
CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval Paper • 2601.15849 • Published 18 days ago • 14
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability Paper • 2601.18778 • Published 14 days ago • 40
view article Article Introducing Waypoint-1: Real-time interactive video diffusion from Overworld +3 21 days ago • 37
Runtime error Featured 62 Waypoint 1 Small 🎮 62 Explore and navigate through AI-generated worlds in real-time
Towards Automated Kernel Generation in the Era of LLMs Paper • 2601.15727 • Published 18 days ago • 18