-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
Collections
Discover the best community collections!
Collections including paper arxiv:2403.07816
-
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 51 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 128 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 15
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
microsoft/phi-1_5
Text Generation • 1B • Updated • 120k • 1.34k -
Language models scale reliably with over-training and on downstream tasks
Paper • 2403.08540 • Published • 15 -
Akashpb13/Swahili_xlsr
Automatic Speech Recognition • 0.3B • Updated • 118 • 8
-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 145 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 66 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 66
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 29 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 54 -
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published • 1
-
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
Paper • 2306.04845 • Published • 4 -
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
Paper • 2306.04073 • Published • 2 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
Unified Scaling Laws for Routed Language Models
Paper • 2202.01169 • Published • 2
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 29 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 54 -
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published • 1
-
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 51 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 128 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 15
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
microsoft/phi-1_5
Text Generation • 1B • Updated • 120k • 1.34k -
Language models scale reliably with over-training and on downstream tasks
Paper • 2403.08540 • Published • 15 -
Akashpb13/Swahili_xlsr
Automatic Speech Recognition • 0.3B • Updated • 118 • 8
-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 145 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 66 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 66
-
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
Paper • 2306.04845 • Published • 4 -
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
Paper • 2306.04073 • Published • 2 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
Unified Scaling Laws for Routed Language Models
Paper • 2202.01169 • Published • 2