Collections
Discover the best community collections!
Collections including paper arxiv:2506.20920
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 112 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 98 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 102
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 69 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 197 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper • 2504.01833 • Published • 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 69 -
HuggingFaceFW/fineweb-2
Viewer • Updated • 5.02B • 57.4k • 624 -
68
Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks
📝Evaluate multilingual models using FineTasks
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper • 2309.05519 • Published • 78 -
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper • 2309.03883 • Published • 35 -
apple/DCLM-7B
7B • Updated • 52 • 831 -
Aria: An Open Multimodal Native Mixture-of-Experts Model
Paper • 2410.05993 • Published • 112
-
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 65 -
FreedomIntelligence/ShareGPT-4o-Image
Viewer • Updated • 92.3k • 1.81k • 87 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 69
-
MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
Paper • 2405.19504 • Published • 3 -
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling
Paper • 2506.20452 • Published • 19 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 69 -
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
Paper • 2507.18553 • Published • 39
-
The Curse of Depth in Large Language Models
Paper • 2502.05795 • Published • 41 -
Transformers without Normalization
Paper • 2503.10622 • Published • 168 -
Parallel Scaling Law for Language Models
Paper • 2505.10475 • Published • 83 -
Learning to Skip the Middle Layers of Transformers
Paper • 2506.21103 • Published • 17
-
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 65 -
FreedomIntelligence/ShareGPT-4o-Image
Viewer • Updated • 92.3k • 1.81k • 87 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 69
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 112 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 98 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 102
-
MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
Paper • 2405.19504 • Published • 3 -
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling
Paper • 2506.20452 • Published • 19 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 69 -
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
Paper • 2507.18553 • Published • 39
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 69 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 197 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper • 2504.01833 • Published • 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242
-
The Curse of Depth in Large Language Models
Paper • 2502.05795 • Published • 41 -
Transformers without Normalization
Paper • 2503.10622 • Published • 168 -
Parallel Scaling Law for Language Models
Paper • 2505.10475 • Published • 83 -
Learning to Skip the Middle Layers of Transformers
Paper • 2506.21103 • Published • 17
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 69 -
HuggingFaceFW/fineweb-2
Viewer • Updated • 5.02B • 57.4k • 624 -
68
Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks
📝Evaluate multilingual models using FineTasks
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper • 2309.05519 • Published • 78 -
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper • 2309.03883 • Published • 35 -
apple/DCLM-7B
7B • Updated • 52 • 831 -
Aria: An Open Multimodal Native Mixture-of-Experts Model
Paper • 2410.05993 • Published • 112