Paper - a dhtjrgus Collection

dhtjrgus 's Collections

Paper

updated about 12 hours ago

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published 21 days ago • 146
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Paper • 2603.12228 • Published 19 days ago • 12
Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 48
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs

Paper • 2410.16144 • Published Oct 21, 2024 • 5
Efficient Exploration at Scale

Paper • 2603.17378 • Published 13 days ago • 13
Attention Residuals

Paper • 2603.15031 • Published 15 days ago • 169
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 150
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 68
KV Cache Transform Coding for Compact Storage in LLM Inference

Paper • 2511.01815 • Published Nov 3, 2025 • 3
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Paper • 2504.19874 • Published Apr 28, 2025 • 23
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead

Paper • 2406.03482 • Published Jun 5, 2024 • 1
PolarQuant: Quantizing KV Caches with Polar Transformation

Paper • 2502.02617 • Published Feb 4, 2025 • 1
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Paper • 2603.23516 • Published 25 days ago • 40
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Paper • 2010.11929 • Published Oct 22, 2020 • 15
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Paper • 2503.01840 • Published Mar 3, 2025 • 9