-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 69 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 197 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper • 2504.01833 • Published • 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242
Collections
Discover the best community collections!
Collections including paper arxiv:2305.16264
-
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Paper • 2404.03413 • Published • 29 -
Scaling Data-Constrained Language Models
Paper • 2305.16264 • Published • 17 -
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Paper • 2406.19370 • Published • 1
-
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 89 -
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Paper • 2503.07703 • Published • 36 -
Gemini Embedding: Generalizable Embeddings from Gemini
Paper • 2503.07891 • Published • 42 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 47
-
Data Selection for Language Models via Importance Resampling
Paper • 2302.03169 • Published -
Scaling Data-Constrained Language Models
Paper • 2305.16264 • Published • 17 -
Challenges with unsupervised LLM knowledge discovery
Paper • 2312.10029 • Published • 10 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 32
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 69 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 197 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper • 2504.01833 • Published • 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242
-
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 89 -
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Paper • 2503.07703 • Published • 36 -
Gemini Embedding: Generalizable Embeddings from Gemini
Paper • 2503.07891 • Published • 42 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 47
-
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Paper • 2404.03413 • Published • 29 -
Scaling Data-Constrained Language Models
Paper • 2305.16264 • Published • 17 -
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Paper • 2406.19370 • Published • 1
-
Data Selection for Language Models via Importance Resampling
Paper • 2302.03169 • Published -
Scaling Data-Constrained Language Models
Paper • 2305.16264 • Published • 17 -
Challenges with unsupervised LLM knowledge discovery
Paper • 2312.10029 • Published • 10 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 32