Collections
Discover the best community collections!
Collections including paper arxiv:2506.21734
-
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 30 -
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Paper • 2507.07955 • Published • 24 -
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 79 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 128
-
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 -
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 -
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 -
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 256 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 89 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 16 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 14
-
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
Textbooks Are All You Need
Paper • 2306.11644 • Published • 146 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 112 -
Large Language Models Struggle to Learn Long-Tail Knowledge
Paper • 2211.08411 • Published • 3
-
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 30 -
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Paper • 2507.07955 • Published • 24 -
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 79 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 128
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 256 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 89 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 16 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 14
-
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 -
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 -
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 -
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1
-
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
Textbooks Are All You Need
Paper • 2306.11644 • Published • 146 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 112 -
Large Language Models Struggle to Learn Long-Tail Knowledge
Paper • 2211.08411 • Published • 3