Gemma-3N 4B Persian - General Knowledge

🇮🇷 Persian Language Model | 🤖 Conversational AI | 📚 General Knowledge

Model Description

This model is a fine-tuned version of unsloth/gemma-3n-E4B-it, optimized for Persian (Farsi) conversational tasks focused on general knowledge. It employs QLoRA techniques for efficient adaptation and is merged into a standalone model suitable for deployment.

Model Details

Base Model and Architecture

Base Model: unsloth/gemma-3n-E4B-it (Google Gemma 3N 4B Instruction-Tuned).
Model Type: Causal language model.
Model Size: Approximately 9.9 GB (16-bit precision).
Context Length: Supports up to 32,768 tokens, trained with 4,000 tokens.
Vocabulary: Gemma tokenizer vocabulary.

Intended Uses

This model is designed for direct use in Persian conversational AI, including instruction-following and general knowledge queries in domains such as Persian heritage, programming, architecture, and tourism. It is suitable for downstream applications like chat interfaces or educational tools. Out-of-scope uses include non-Persian languages or safety-critical applications.

Training

Training Data

Dataset: mshojaei77/persian-gk (cleaned version: mshojaei77/persian-gk-cleaned), comprising 5,897 Persian conversations in ChatML format.
Domains: Programming, Persian heritage, architecture, tourism, and general Q&A.
License: CC-BY-4.0.

Training Procedure

The model was fine-tuned using QLoRA with 4-bit quantization.

LoRA Parameters: Rank=8, alpha=16, dropout=0.0; target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj.
Hyperparameters: Learning rate=2e-5, batch size=2 (effective=8 with gradient accumulation=4), epochs=1, optimizer=AdamW 8-bit, weight decay=0.01, warmup steps=10, linear LR scheduler, seed=3407.
Framework: Unsloth with Weights & Biases monitoring.
Infrastructure: Google Colab with GPU acceleration.

The merging process integrated LoRA adapters into the base model, converting to 16-bit precision for standalone use.

Evaluation Results

The model achieved a final training loss of 1.78, with gradient norms stabilizing between 0.7 and 2.0. Training completed in 2 hours and 20 minutes on a T4 GPU.

Inference performance:

Scenario	GPU	max_new_tokens=256	Runtime
Single prompt	RTX T4 (16 GB)	8.5 s	22 tok s⁻¹
Batch 4	RTX T4	19 s	54 tok s⁻¹ aggregated

For detailed analyses of training dynamics, including loss and gradient norm charts, refer to the technical report.

Bias, Risks, and Limitations

Limitations

Language Scope: The model is optimised for Persian (Farsi). Responses in other languages may be less fluent or factually reliable.
Knowledge Cut-off: Training data ends at January 2024; the model lacks awareness of subsequent events.
Hallucination: Like other LLMs, it can generate plausible-sounding but incorrect or fabricated information. Always verify critical outputs.
Context Window: Although the architecture supports 32 k tokens, prompts exceeding 4 k tokens were not present during training and may degrade performance.
Domain Transfer: Performance may drop on highly specialised or safety-critical domains (medical, legal, financial) that are under-represented in the dataset.
Compute Requirements: FP16 inference needs ≈ 10 GB GPU VRAM; use 8-bit/4-bit quantisation for lower-resource devices.
Dataset Scale: Limited to ~6k pairs, potentially overlooking linguistic diversity.
Training Regimen: Single-epoch training may not fully optimize performance.

Ethical & Safety Considerations

The model may reflect cultural or societal biases found in the source data.
Do not rely on the model as the sole source of truth for professional advice (medical, legal, financial, etc.).
Implement content filtering and human oversight when deploying user-facing applications, especially for minors or vulnerable groups.
Comply with the Gemma Terms of Use, dataset licence (CC-BY-4.0), and local regulations on user privacy and content moderation.
Potential for misuse in generating harmful content; mitigations include prompt engineering and output filtering.

Environmental Impact

Training emitted approximately 0.5 kg CO₂ equivalent, based on GPU usage and regional electricity factors.

Reproduction

For detailed technical information about the training process, methodology, and evaluation results, see the technical report.

Related Resources

Base Model: unsloth/gemma-3n-E4B-it.
Adapters: mshojaei77/gemma-3n-E4B-persian-lora-adapters.
Dataset: mshojaei77/persian-gk.
GitHub: mshojaei77/gemma-3n-E4B-persian-qlora.
Frameworks: Unsloth (arXiv:2305.14314), PEFT (arXiv:2106.09685), Transformers.

Citation

@misc{gemma3n_persian_2025,
  title={Gemma-3N 4B Persian Fine-tuned Model},
  author={Shojaei, M.},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/mshojaei77/gemma-3n-E4B-persian},
  note={Fine-tuned using QLoRA on Persian General Knowledge dataset}
}

Dataset citation:

@misc{persian_gk_2025,
  title={persian-gk: Persian General Knowledge Chat Dataset},
  author={Shojaei, M. and Contributors},
  year={2025},
  url={https://huggingface.co/datasets/mshojaei77/persian-gk}
}

License

Licensed under the Gemma Terms of Use (https://ai.google.dev/gemma/terms). Downstream users must adhere to these terms.

Acknowledgments

Thanks to Google for the Gemma architecture, the Unsloth team for training tools, Hugging Face for hosting, and the Persian NLP community for contributions.

mshojaei77
/

gemma-3n-E4B-persian