Kemcho-Gemma-3-1B-IT
Kemcho-Gemma-3-1B-IT is a fine-tuned model from google/gemma-3-1b-it
for Gujarati instruction following and general assistant tasks. The training used LoRA and was merged into the base weights for single-repo deployment.
- Architecture: Gemma 3 (1B) decoder-only
- Weights:
model.safetensors
(~4 GB, bf16 recommended) - Tokenizer: Included (with
chat_template.jinja
) - Best for: Gujarati chat, rewriting, summarization, simple Q&A
Quick Start
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "shalinm/Kemcho-Gemma-3-1B-IT"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16, # or "auto"
device_map="auto",
)
messages = [
{"role": "system", "content": "તમે મદદરૂપ ગુજરાતી સહાયક છો."},
{"role": "user", "content": "આપેલ વાક્યોને એક વાક્યમાં ભેગા કરો.\n\nહું કામ માટે મોડો પડ્યો હતો. મારે મારા બાળકોને શાળાએ લઈ જવાનું હતું."},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_p=0.9,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Model Details
- Base model:
google/gemma-3-1b-it
- Training method: SFT using LoRA, then merged into base weights
- LoRA config (pre-merge):
- rank: 16
- alpha: 32
- dropout: 0.1
- target modules: attention
q_proj
,k_proj
,v_proj
,o_proj
, mlpup_proj
,down_proj
,gate_proj
- Trainable params (pre-merge):
9.69M of ~1.01B (0.96%) - Precision: trained/evaluated in bfloat16
Languages
- Primary: Gujarati (gu, gu-IN)
- Secondary: Handles English but not tuned for it
Training
- Hardware: 1 x A40
- Epochs: 3
- Logged steps: 300
- Checkpoints:
safetensors
Data
- Instruction-following dataset
- Split:
train
- Fields:
instruction
,input
,output
- Size: 51,978 examples
- Example:
- instruction: "આપેલ વાક્યોને એક વાક્યમાં ભેગા કરો."
- input: "હું કામ માટે મોડો પડ્યો હતો. મારે મારા બાળકોને શાળાએ લઈ જવાનું હતું."
- output: "હું કામ માટે મોડો પડ્યો હતો કારણે કે મારે મારા બાળકોને શાળાએ લઈ જવાનું હતું."
- Plain-text dataset (auxiliary eval/perplexity)
- Splits:
train
800,test
200 - Field:
text
Training/Validation Loss (snapshot)
step | train_loss | val_loss |
---|---|---|
300 | 1.1884 | 1.7497 |
Intended Uses and Limitations
- Intended: Gujarati assistant tasks—Q&A, rewriting, summarization, everyday instructions.
- Not intended: Safety-critical uses, factual lookup without verification, generating harmful content.
Limitations
- May hallucinate facts or reflect biases in data.
- Not optimized for long-context reasoning or code-heavy tasks.
- Non-Gujarati performance is not specifically tuned.
Safety
- Use with content filtering and human oversight for sensitive domains.
- Consider additional alignment or safety fine-tuning for production.
License
- Weights derived from
google/gemma-3-1b-it
. Use is subject to the Gemma license and terms set by Google. - Ensure compliance with the licenses of any datasets used during fine-tuning.
Acknowledgements
- Base model:
google/gemma-3-1b-it
by Google - Built with Hugging Face
transformers
,peft
, anddatasets
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support