Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16, 2025 • 166
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 435
view article Article MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era Jan 15, 2025 • 48
UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages Paper • 2411.14343 • Published Nov 21, 2024 • 7
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF Text Generation • 71B • Updated Apr 13, 2025 • 4.22k • • 2.06k
view reply Interesting, but how does this approach generalize to arbitrary user query / document domains? Would you need to train a separate network for each domain / dataset?