shalinm/Kemcho-Gemma-3-1B-IT

Kemcho-Gemma-3-1B-IT

Kemcho-Gemma-3-1B-IT is a fine-tuned model from google/gemma-3-1b-it for Gujarati instruction following and general assistant tasks. The training used LoRA and was merged into the base weights for single-repo deployment.

Architecture: Gemma 3 (1B) decoder-only
Weights: model.safetensors (~4 GB, bf16 recommended)
Tokenizer: Included (with chat_template.jinja)
Best for: Gujarati chat, rewriting, summarization, simple Q&A

Quick Start

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "shalinm/Kemcho-Gemma-3-1B-IT" 

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,   # or "auto"
    device_map="auto",
)

messages = [
    {"role": "system", "content": "તમે મદદરૂપ ગુજરાતી સહાયક છો."},
    {"role": "user", "content": "આપેલ વાક્યોને એક વાક્યમાં ભેગા કરો.\n\nહું કામ માટે મોડો પડ્યો હતો. મારે મારા બાળકોને શાળાએ લઈ જવાનું હતું."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Details

Base model: google/gemma-3-1b-it
Training method: SFT using LoRA, then merged into base weights
LoRA config (pre-merge):
- rank: 16
- alpha: 32
- dropout: 0.1
- target modules: attention q_proj, k_proj, v_proj, o_proj, mlp up_proj, down_proj,gate_proj
Trainable params (pre-merge): ~~9.69M of ~1.01B (~~0.96%)
Precision: trained/evaluated in bfloat16

Languages

Primary: Gujarati (gu, gu-IN)
Secondary: Handles English but not tuned for it

Training

Hardware: 1 x A40
Epochs: 3
Logged steps: 300
Checkpoints: safetensors

Data

Instruction-following dataset

Split: train
Fields: instruction, input, output
Size: 51,978 examples
Example:
- instruction: "આપેલ વાક્યોને એક વાક્યમાં ભેગા કરો."
- input: "હું કામ માટે મોડો પડ્યો હતો. મારે મારા બાળકોને શાળાએ લઈ જવાનું હતું."
- output: "હું કામ માટે મોડો પડ્યો હતો કારણે કે મારે મારા બાળકોને શાળાએ લઈ જવાનું હતું."

Plain-text dataset (auxiliary eval/perplexity)

Splits: train 800, test 200
Field: text

Training/Validation Loss (snapshot)

step	train_loss	val_loss
300	1.1884	1.7497

Intended Uses and Limitations

Intended: Gujarati assistant tasks—Q&A, rewriting, summarization, everyday instructions.
Not intended: Safety-critical uses, factual lookup without verification, generating harmful content.

Limitations

May hallucinate facts or reflect biases in data.
Not optimized for long-context reasoning or code-heavy tasks.
Non-Gujarati performance is not specifically tuned.

Safety

Use with content filtering and human oversight for sensitive domains.
Consider additional alignment or safety fine-tuning for production.

License

Weights derived from google/gemma-3-1b-it. Use is subject to the Gemma license and terms set by Google.
Ensure compliance with the licenses of any datasets used during fine-tuning.

Acknowledgements

Base model: google/gemma-3-1b-it by Google
Built with Hugging Face transformers, peft, and datasets

shalinm
/

Kemcho-Gemma-3-1B-IT