📌 Overview

A 4-bit AWQ quantized version of Google/medgemma-4b-it optimized for efficient inference using the MLX library, designed to handle long-context tasks (192k tokens) with reduced resource usage. Retains core capabilities of medgemma-4b while enabling deployment on edge devices.

Downloads last month
33
Safetensors
Model size
753M params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Goraint/medgemma-4b-it-MLX-AWQ-4bit

Finetuned
(258)
this model