medgemma-4b-it — medical fine-tune (5-bit GGUF)
Model Details
Files
medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf
(~2.83 GB)
How to run (llama.cpp)
# Requires llama.cpp. You can run directly from the Hub path:
llama-cli -m hf://sharadsnaik/medgemma-4b-it-medical-gguf/medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf -p "Hello"
How to Get Started with the Model
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
p = hf_hub_download("sharadsnaik/medgemma-4b-it-medical-gguf",
"medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf")
llm = Llama(model_path=p, n_ctx=4096, n_threads=8, chat_format="gemma")
print(llm.create_chat_completion(messages=[{"role":"user","content":"Hello"}]))
[More Information Needed]
Training Details
Training Data
ruslanmv/ai-medical-chatbot
Sample Code Usage:
app.py
import os, gradio as gr
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
# Your model repo + filename
REPO_ID = "sharadsnaik/medgemma-4b-it-medical-gguf"
FILENAME = "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf"
# Download from Hub to local cache
MODEL_PATH = hf_hub_download(repo_id=REPO_ID, filename=FILENAME, repo_type="model")
# Create the llama.cpp model
# Use all available CPU threads; chat_format="gemma" matches Gemma-style prompts
llm = Llama(
model_path=MODEL_PATH,
n_ctx=4096,
n_threads=os.cpu_count(),
chat_format="gemma" # important for Gemma/Med-Gemma instruction formatting
)
def chat_fn(message, history):
# Convert Gradio history -> OpenAI-style messages
messages = []
for user_msg, bot_msg in history:
messages.append({"role":"user","content":user_msg})
if bot_msg:
messages.append({"role":"assistant","content":bot_msg})
messages.append({"role":"user","content":message})
out = llm.create_chat_completion(messages=messages, temperature=0.6, top_p=0.95)
reply = out["choices"][0]["message"]["content"]
return reply
demo = gr.ChatInterface(fn=chat_fn, title="MedGemma 4B (Q5_K_M) — CPU Space")
if __name__ == "__main__":
demo.launch()
- Downloads last month
- 158
Hardware compatibility
Log In
to view the estimation
5-bit
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support