Local Installation Video and Testing - Step by Step
Hi,
Kudos on producing such a sublime model. I did a local installation and testing video :
https://youtu.be/tMZSo21cIPs?si=SkGjJglyclwE7_jz
Thanks and regards,
Fahd
This may not be seen, but it would be great to see some Gemma3 models tuned for text-embedding benchmarks (eg. MTEB Leaderboard). In most of my LLM work I use embedding models like the Qwen3-Embedding series, but there are currently very few high quality alternatives.
Thanks for the release :)
Hi @fahdmirzac ,
Thanks for your interest and great suggestion! We're actively evaluating possible directions for fine-tuning, including for embedding use cases. Your input helps guide priorities — much appreciated!
for some odd reasons I am getting stuck here
outputs = model.generate(**inputs,
max_new_tokens=256,
temperature=0.7,
do_sample=True,
pad_token_id = tokenizer.eos_token_id)
Anyone else having this issue?
@kaliaanup I'm not having this issue! Did you fix it? Here's the code I wrote to get it to work (I write integration tests in pytest): https://github.com/InServiceOfX/InServiceOfX/blob/master/PythonLibraries/HuggingFace/MoreTransformers/tests/integration_tests/Models/LLMs/test_google_gemma-3-1b-it.py
Basically,
tokenizer = AutoTokenizer.from_pretrained(
model_path,
local_files_only=True,
trust_remote_code=True)
model = Gemma3ForCausalLM.from_pretrained(
model_path,
device_map="cuda:0",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
prompt = "What is C. elegans?"
prompt_str = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=False)
encoded = tokenizer(prompt_str, return_tensors='pt', padding=True).to(
model.device)
encoded = {k: v.to(model.device) for k, v in encoded.items()}
output = model.generate(
input_ids=encoded["input_ids"],
attention_mask=encoded["attention_mask"],
do_sample=True,
# temperature, min_p, repetition_penalty suggested by
# https://huggingface.co/LiquidAI/LFM2-1.2B
temperature=0.9,
min_p=0.15,
repetition_penalty=1.05,
max_new_tokens=65536
)
print(
"With special tokens: ",
tokenizer.decode(output[0], skip_special_tokens=False))
print(
"Without special tokens: ",
tokenizer.decode(output[0], skip_special_tokens=True))
I wrote my own wrapper class, but tested it on google's gemma-3-270m-it (the wrapper class code is basically same as above):