Kemcho-Gemma-3-1B-IT

Kemcho-Gemma-3-1B-IT is a fine-tuned model from google/gemma-3-1b-it for Gujarati instruction following and general assistant tasks. The training used LoRA and was merged into the base weights for single-repo deployment.

  • Architecture: Gemma 3 (1B) decoder-only
  • Weights: model.safetensors (~4 GB, bf16 recommended)
  • Tokenizer: Included (with chat_template.jinja)
  • Best for: Gujarati chat, rewriting, summarization, simple Q&A

Quick Start

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "shalinm/Kemcho-Gemma-3-1B-IT" 

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,   # or "auto"
    device_map="auto",
)

messages = [
    {"role": "system", "content": "તમે મદદરૂપ ગુજરાતી સહાયક છો."},
    {"role": "user", "content": "આપેલ વાક્યોને એક વાક્યમાં ભેગા કરો.\n\nહું કામ માટે મોડો પડ્યો હતો. મારે મારા બાળકોને શાળાએ લઈ જવાનું હતું."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Details

  • Base model: google/gemma-3-1b-it
  • Training method: SFT using LoRA, then merged into base weights
  • LoRA config (pre-merge):
    • rank: 16
    • alpha: 32
    • dropout: 0.1
    • target modules: attention q_proj, k_proj, v_proj, o_proj, mlp up_proj, down_proj,gate_proj
  • Trainable params (pre-merge): 9.69M of ~1.01B (0.96%)
  • Precision: trained/evaluated in bfloat16

Languages

  • Primary: Gujarati (gu, gu-IN)
  • Secondary: Handles English but not tuned for it

Training

  • Hardware: 1 x A40
  • Epochs: 3
  • Logged steps: 300
  • Checkpoints: safetensors

Data

  1. Instruction-following dataset
  • Split: train
  • Fields: instruction, input, output
  • Size: 51,978 examples
  • Example:
    • instruction: "આપેલ વાક્યોને એક વાક્યમાં ભેગા કરો."
    • input: "હું કામ માટે મોડો પડ્યો હતો. મારે મારા બાળકોને શાળાએ લઈ જવાનું હતું."
    • output: "હું કામ માટે મોડો પડ્યો હતો કારણે કે મારે મારા બાળકોને શાળાએ લઈ જવાનું હતું."
  1. Plain-text dataset (auxiliary eval/perplexity)
  • Splits: train 800, test 200
  • Field: text

Training/Validation Loss (snapshot)

step train_loss val_loss
300 1.1884 1.7497

Intended Uses and Limitations

  • Intended: Gujarati assistant tasks—Q&A, rewriting, summarization, everyday instructions.
  • Not intended: Safety-critical uses, factual lookup without verification, generating harmful content.

Limitations

  • May hallucinate facts or reflect biases in data.
  • Not optimized for long-context reasoning or code-heavy tasks.
  • Non-Gujarati performance is not specifically tuned.

Safety

  • Use with content filtering and human oversight for sensitive domains.
  • Consider additional alignment or safety fine-tuning for production.

License

  • Weights derived from google/gemma-3-1b-it. Use is subject to the Gemma license and terms set by Google.
  • Ensure compliance with the licenses of any datasets used during fine-tuning.

Acknowledgements

  • Base model: google/gemma-3-1b-it by Google
  • Built with Hugging Face transformers, peft, and datasets
Downloads last month
11
Safetensors
Model size
1,000M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shalinm/Kemcho-Gemma-3-1B-IT

Finetuned
(249)
this model