view article Article Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training By siro1 and 4 others • 17 days ago • 51
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • 20 days ago • 472
view article Article Transformers backend integration in SGLang By marcsun13 and 4 others • Jun 23 • 53
view article Article (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware By derekl35 and 4 others • Jun 19 • 83
view article Article Exploring Quantization Backends in Diffusers By derekl35 and 2 others • May 21 • 40
view article Article Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs By wenhuach and 8 others • Apr 29 • 39
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! By medmekk and 1 other • Mar 7 • 76
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy By medmekk and 5 others • Sep 18, 2024 • 265
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context By philschmid and 7 others • Jul 23, 2024 • 237
view article Article quanto: a pytorch quantization toolkit By dacorvo and 2 others • Mar 18, 2024 • 42
view article Article Overview of natively supported quantization schemes in 🤗 Transformers By ybelkada and 4 others • Sep 12, 2023 • 12
view article Article Making LLMs lighter with AutoGPTQ and transformers By marcsun13 and 5 others • Aug 23, 2023 • 58