medgemma-4b-it — medical fine-tune (5-bit GGUF)

Model Details

Files

  • medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf (~2.83 GB)

How to run (llama.cpp)

# Requires llama.cpp. You can run directly from the Hub path:
llama-cli -m hf://sharadsnaik/medgemma-4b-it-medical-gguf/medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf -p "Hello"

How to Get Started with the Model

from huggingface_hub import hf_hub_download
from llama_cpp import Llama
p = hf_hub_download("sharadsnaik/medgemma-4b-it-medical-gguf",
                    "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf")
llm = Llama(model_path=p, n_ctx=4096, n_threads=8, chat_format="gemma")
print(llm.create_chat_completion(messages=[{"role":"user","content":"Hello"}]))

[More Information Needed]

Training Details

Training Data

ruslanmv/ai-medical-chatbot

Sample Code Usage:

app.py

import os, gradio as gr
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

# Your model repo + filename
REPO_ID = "sharadsnaik/medgemma-4b-it-medical-gguf"
FILENAME = "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf"

# Download from Hub to local cache
MODEL_PATH = hf_hub_download(repo_id=REPO_ID, filename=FILENAME, repo_type="model")

# Create the llama.cpp model
# Use all available CPU threads; chat_format="gemma" matches Gemma-style prompts
llm = Llama(
    model_path=MODEL_PATH,
    n_ctx=4096,
    n_threads=os.cpu_count(),
    chat_format="gemma"  # important for Gemma/Med-Gemma instruction formatting
)

def chat_fn(message, history):
    # Convert Gradio history -> OpenAI-style messages
    messages = []
    for user_msg, bot_msg in history:
        messages.append({"role":"user","content":user_msg})
        if bot_msg:
            messages.append({"role":"assistant","content":bot_msg})
    messages.append({"role":"user","content":message})

    out = llm.create_chat_completion(messages=messages, temperature=0.6, top_p=0.95)
    reply = out["choices"][0]["message"]["content"]
    return reply

demo = gr.ChatInterface(fn=chat_fn, title="MedGemma 4B (Q5_K_M) — CPU Space")

if __name__ == "__main__":
    demo.launch()
Downloads last month
158
GGUF
Model size
3.88B params
Architecture
gemma3
Hardware compatibility
Log In to view the estimation

5-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sharadsnaik/medgemma-4b-it-medical-gguf

Quantized
(28)
this model

Dataset used to train sharadsnaik/medgemma-4b-it-medical-gguf