Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts Paper • 2406.12034 • Published Jun 17, 2024 • 16
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21, 2024 • 34
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15, 2024 • 26