Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers Paper • 2601.04890 • Published 18 days ago • 41
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published 26 days ago • 40
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 28 days ago • 95
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published 26 days ago • 61