📢 NVIDIA Releases Nemotron-CC-Math Pre-Training Dataset: A High-Quality, Web-Scale Math Corpus for Pretraining Large Language Models 6 days ago • 1
NVIDIA Releases Improved Pretraining Dataset: Preserves High Value Math & Code, and Augments with Multi-Lingual 6 days ago • 2
NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks 13 days ago • 66
Llama-NeMoRetriever-ColEmbed: Developer-Focused Guide to NVIDIA's State-of-the-Art Text-Image Retrieval Jul 9 • 4
Nemotron-Personas: Improve AI Training With the First Synthetic Personas Dataset Aligned to Real-World Distributions Jun 10 • 15
nvidia/Nemotron-Research-Reasoning-Qwen-1.5B Text Generation • 2B • Updated 12 days ago • 10.9k • 211
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8 Text Generation • 50B • Updated 24 days ago • 2.51k • 15