Turkish-LLM-14B-Instruct
A High-Performance Turkish Language Model Fine-Tuned with SFT + DPO
Developer: Ogulcan Aydogan | Release: March 2026 | License: Apache 2.0
Table of Contents
- Overview
- Motivation
- Model Details
- Training Pipeline
- Benchmark Results
- Model Family
- Usage
- Limitations and Bias
- Citation
- Turkce / Turkish Section
Overview
Turkish-LLM-14B-Instruct is a 14.7-billion-parameter Turkish language model built on top of Qwen/Qwen2.5-14B-Instruct. It was fine-tuned in two stages -- Supervised Fine-Tuning (SFT) on curated Turkish instruction data, followed by Direct Preference Optimization (DPO) for alignment -- to deliver state-of-the-art performance on Turkish natural language understanding and generation tasks.
The model demonstrates a +0.47 point improvement on MMLU_TR over the base Qwen2.5-14B-Instruct, achieved through a two-stage SFT + DPO pipeline trained on 242K+ curated Turkish instruction examples. This is part of an ongoing effort to build a comprehensive Turkish LLM family spanning 1.5B to 72B parameters.
Motivation
Turkish is spoken by over 80 million native speakers, making it one of the most widely spoken languages in the world. Despite this, Turkish remains significantly underrepresented in the large language model ecosystem. The vast majority of frontier LLMs are trained predominantly on English data, and their Turkish capabilities are incidental rather than intentional.
This project addresses that gap directly:
- Linguistic coverage: Turkish is an agglutinative language with rich morphology, vowel harmony, and SOV word order -- properties that are poorly captured by models trained primarily on English.
- Cultural context: Effective Turkish language models require not just linguistic fluency but also an understanding of Turkish history, geography, science education curricula, and cultural norms.
- Accessibility: By releasing this model under the Apache 2.0 license and providing GGUF quantizations for local deployment, we aim to make high-quality Turkish NLP accessible to researchers, developers, and organizations across Turkey and the broader Turkic-language community.
- Benchmark-driven development: Each model version is rigorously evaluated against established Turkish benchmarks to ensure that fine-tuning yields genuine improvements rather than superficial fluency.
Model Details
| Property | Value |
|---|---|
| Developer | Ogulcan Aydogan |
| Model Name | Turkish-LLM-14B-Instruct |
| Base Model | Qwen/Qwen2.5-14B-Instruct |
| Parameters | 14.7B |
| Architecture | Transformer (decoder-only, causal language model) |
| Context Length | 4,096 tokens |
| Precision | bfloat16 |
| Fine-Tuning Method | SFT + DPO (Direct Preference Optimization) |
| Language | Turkish (tr) |
| License | Apache 2.0 |
| Release Date | March 2026 |
Training Pipeline
The model was trained in a two-stage pipeline, each using parameter-efficient LoRA adapters to maximize quality while remaining computationally feasible on a single GPU.
Stage 1: Supervised Fine-Tuning (SFT)
The base model was fine-tuned on a curated Turkish instruction-following dataset comprising approximately 125,000 examples spanning diverse domains.
| Hyperparameter | Value |
|---|---|
| Method | LoRA (Low-Rank Adaptation) |
| LoRA rank (r) | 32 |
| LoRA alpha | 64 |
| Dataset size | ~242K instruction-response pairs |
| Domains | STEM, Mathematics, Science, History, Geography, General Knowledge |
| Framework | HuggingFace TRL + PEFT |
| Hardware | NVIDIA A100 80GB PCIe |
Stage 2: DPO Alignment (Direct Preference Optimization)
Following SFT, the model was further aligned using DPO on Turkish preference data to improve response quality and reduce undesirable outputs.
| Hyperparameter | Value |
|---|---|
| Method | DPO with LoRA |
| LoRA rank (r) | 32 |
| Beta | 0.1 |
| Learning rate | 5e-7 |
| Dataset | selimc/orpo-dpo-mix-TR-20k |
| Dataset size | 19.9K preference pairs |
| Framework | HuggingFace TRL + PEFT |
| Hardware | NVIDIA A100 80GB PCIe |
Benchmark Results
All evaluations were conducted under identical conditions. Scores represent accuracy (%).
| Model | MMLU_TR | XCOPA_TR | XNLI_TR | TurkishMMLU |
|---|---|---|---|---|
| Qwen2.5-14B-Instruct (base) | 59.47 | 66.80 | 41.53 | |
| Turkish-LLM-14B v3 (SFT+DPO) | 59.42 | 66.00 | 43.33 | |
| Turkish-LLM-14B v4 (SFT) | 59.76 | 64.60 | 41.53 | |
| Turkish-LLM-14B-Instruct (this model, v5 SFT+DPO) | 59.94 | 64.80 | 41.53 |
Key Findings
- MMLU_TR: +0.47 points over the base model (59.47 -> 59.94), the highest improvement achieved across all Turkish fine-tuning experiments.
- XCOPA_TR: A trade-off of -2.0 points (66.80 -> 64.80) was observed, consistent with the shift toward STEM-focused training data. The XCOPA test set contains only 500 examples, making small score differences statistically marginal.
- XNLI_TR: Maintained at base model level (41.53), indicating no degradation on natural language inference.
- Multiple training strategies were explored (SFT, DPO, KTO, DARE-TIES merge) across 6 model versions to find the optimal configuration.
- Future model versions will incorporate continued pretraining on large-scale Turkish corpora to improve all benchmarks simultaneously.
Model Family
Turkish-LLM is a family of instruction-tuned Turkish language models at multiple scales:
| Model | Parameters | Base Model | Status |
|---|---|---|---|
| Turkish-LLM-1.5B-Instruct | 1.5B | Qwen2.5-1.5B | Coming Soon |
| Turkish-LLM-3B-Instruct | 3B | Qwen2.5-3B | Coming Soon |
| Turkish-LLM-7B-Instruct | 7B | Turkcell-LLM-7b | Available |
| Turkish-LLM-14B-Instruct | 14.7B | Qwen2.5-14B | Available |
| Turkish-LLM-14B-Instruct-GGUF | 14.7B | Qwen2.5-14B | Available |
| Turkish-LLM-32B-Instruct | 32B | Qwen2.5-32B | Coming Soon |
Usage
1. Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "ogulcanaydogan/Turkish-LLM-14B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "system", "content": "Sen yardimci bir Turkce yapay zeka asistanisin."},
{"role": "user", "content": "Turkiye'nin en buyuk golu hangisidir?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
2. vLLM (High-Performance Serving)
from vllm import LLM, SamplingParams
llm = LLM(model="ogulcanaydogan/Turkish-LLM-14B-Instruct", dtype="bfloat16")
params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)
outputs = llm.generate(["Yapay zeka nedir?"], params)
print(outputs[0].outputs[0].text)
3. Ollama (Local Deployment)
# Download and run the GGUF quantized version
ollama run ogulcanaydogan/Turkish-LLM-14B-Instruct-GGUF
Limitations and Bias
While Turkish-LLM-14B-Instruct represents a meaningful step forward for Turkish NLP, users should be aware of the following limitations:
- Hallucination: Like all large language models, this model can generate plausible-sounding but factually incorrect information. It should not be used as a sole source of truth for critical applications.
- Training data scope: The SFT dataset (~242K examples) covers science, history, geography, and general knowledge but does not exhaustively represent all Turkish domains. Performance on highly specialized topics (e.g., legal, medical) may be limited.
- Bias: The model inherits biases present in both the base model's pretraining data and the Turkish fine-tuning data. Outputs may reflect societal biases, stereotypes, or cultural assumptions.
- Context length: The model supports a maximum context of 4,096 tokens. Inputs exceeding this length will be truncated.
- Turkish-centric: While the model retains multilingual capabilities from the Qwen2.5 base, it has been optimized specifically for Turkish. Performance on other languages may differ from the base model.
- Safety: Although DPO alignment reduces the likelihood of harmful outputs, no language model is fully safe. Users should implement additional safety measures for production deployments.
- Evaluation coverage: Benchmarks capture specific aspects of language understanding. Real-world performance may vary from benchmark scores depending on the use case.
We encourage users to evaluate the model on their specific use cases and to report any issues or concerns.
Citation
If you use Turkish-LLM-14B-Instruct in your research or applications, please cite:
@misc{aydogan2026turkishllm14b,
title={Turkish-LLM-14B-Instruct: A Fine-Tuned Turkish Language Model with SFT and DPO},
author={Ogulcan Aydogan},
year={2026},
url={https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct},
note={Fine-tuned from Qwen/Qwen2.5-14B-Instruct with supervised fine-tuning and direct preference optimization for Turkish}
}
Genel Bakis
Turkish-LLM-14B-Instruct, Qwen/Qwen2.5-14B-Instruct temel modeli uzerine insa edilmis, 14.7 milyar parametreli bir Turkce dil modelidir. Model, iki asamali bir egitim surecinden gecmistir:
- SFT (Gozetimli Ince Ayar): Bilim, tarih, cografya ve genel kultur alanlarini kapsayan yaklasik 242.000 Turkce talimat-yanit cifti ile egitilmistir.
- DPO (Dogrudan Tercih Optimizasyonu): 19.900 Turkce tercih cifti kullanilarak model ciktilari hizalanmistir.
Bu iki asamali yaklasim, modelin Turkce dogal dil anlama ve uretme gorevlerinde olculebilir iyilesmeler saglamasina olanak tanimitir.
Neden Turkce Dil Modeli?
Turkce, 80 milyonun uzerinde ana dili konusucusuyla dunyanin en cok konusulan dillerinden biridir. Buna ragmen, buyuk dil modeli ekosisteminde Turkce yeterince temsil edilmemektedir. Mevcut modellerin buyuk cogunlugu Ingilizce veri uzerinde egitilmekte ve Turkce yetenekleri sinirli kalmaktadir.
Turkce, sondan eklemeli yapisi, unlu uyumu ve SOV cumle dizilimiyle ozel bir dildir. Bu ozelliklerin etkin bir sekilde modellenmesi, Turkce'ye ozgu egitim verileri ve ince ayar surecleri gerektirmektedir.
Karsilastirmali Sonuclar
| Model | MMLU_TR | XCOPA_TR | XNLI_TR |
|---|---|---|---|
| Qwen2.5-14B-Instruct (temel) | 59.47 | 66.80 | 41.53 |
| Turkish-LLM-14B-Instruct (v5 SFT+DPO) | 59.94 | 64.80 | 41.53 |
- MMLU_TR puaninda temel modele gore +0.47 puanlik iyilesme elde edilmistir.
- STEM odakli egitim verisi nedeniyle XCOPA_TR'de -2.0 puanlik bir degisim gozlemlenmistir (500 orneklik test setinde istatistiksel olarak marjinal).
- Gelecek surumlerde buyuk olcekli Turkce veri ile continued pretraining planlanmaktadir.
Kullanim
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_adi = "ogulcanaydogan/Turkish-LLM-14B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_adi)
model = AutoModelForCausalLM.from_pretrained(
model_adi,
torch_dtype=torch.bfloat16,
device_map="auto"
)
mesajlar = [
{"role": "system", "content": "Sen yardimci bir Turkce yapay zeka asistanisin."},
{"role": "user", "content": "Turkiye'nin en buyuk golu hangisidir?"}
]
metin = tokenizer.apply_chat_template(mesajlar, tokenize=False, add_generation_prompt=True)
girdiler = tokenizer(metin, return_tensors="pt").to(model.device)
ciktilar = model.generate(**girdiler, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(ciktilar[0][girdiler.input_ids.shape[-1]:], skip_special_tokens=True))
Yerel kullanim icin GGUF surumleri de mevcuttur:
ollama run ogulcanaydogan/Turkish-LLM-14B-Instruct-GGUF
Sinirlamalar
- Model, tum buyuk dil modelleri gibi yanlis veya uydurma bilgi uretebilir.
- Egitim verisi belirli alanlari kapsamaktadir; uzmanlik gerektiren konularda (hukuk, tip vb.) performans sinirli olabilir.
- Maksimum baglam uzunlugu 4.096 token ile sinirlidir.
- Uretim ortamlarinda ek guvenlik onlemleri alinmasi onerilir.
Atif
@misc{aydogan2026turkishllm14b,
title={Turkish-LLM-14B-Instruct: A Fine-Tuned Turkish Language Model with SFT and DPO},
author={Ogulcan Aydogan},
year={2026},
url={https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct},
note={Fine-tuned from Qwen/Qwen2.5-14B-Instruct with supervised fine-tuning and direct preference optimization for Turkish}
}
Developed by Ogulcan Aydogan
- Downloads last month
- 111
Model tree for ogulcanaydogan/Turkish-LLM-14B-Instruct
Space using ogulcanaydogan/Turkish-LLM-14B-Instruct 1
Collection including ogulcanaydogan/Turkish-LLM-14B-Instruct
Evaluation results
- Accuracy on MMLU_TRself-reported59.940
- Accuracy on XNLI_TRself-reported41.530
- Accuracy on XCOPA_TRself-reported64.800