view article Article What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware By RakshitAralimatti • 16 days ago • 18
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • Jan 30 • 117
view article Article ChatML vs Harmony: Understanding the new Format from OpenAI 🔍 By kuotient • 15 days ago • 27
view article Article Assisted Generation: a new direction toward low-latency text generation By joaogante • May 11, 2023 • 70
view article Article Fine-tuning Llama 2 70B using PyTorch FSDP By smangrul and 3 others • Sep 13, 2023 • 29