🕌 Tadabur — Quran Speech Recognition

Fine-tuned Whisper Medium on the Tadabur dataset for Quran ASR, Surah/Ayah identification, and reciter recognition.

CS465 Machine Learning Project — Spring 2026


What This Model Does

Given a Quran audio recitation, the pipeline returns:

  1. Arabic transcription — 6.26% WER on unseen data
  2. Surah & Ayah identification — fuzzy matched against all 6,236 ayahs
  3. Reciter name — identified from 335 supported reciters at 98.47% accuracy

Performance

ASR Results (500 held-out test samples)

Model WER (%) CER (%)
Whisper Medium Vanilla 41.10% 11.47%
Tadabur-Whisper-Small (Author) 47.06% 12.28%
This model 6.26% 4.41%

Reciter Classifier

Metric Value
Supported reciters 335
Validation accuracy 98.47%
Training accuracy 98.71%

Files in This Repository

File Size Description
model.safetensors 3.06 GB Fine-tuned Whisper Medium weights
reciter_classifier.pt 2.76 MB MLP reciter classifier
reciter_idx_to_id.json 1.25 KB Classifier index → reciter ID
reciter_id_to_idx.json 1.25 KB Reciter ID → classifier index
sheikh_dict.json 2.7 KB Reciter ID → Arabic name
surah_dict.json 2.7 KB Surah index → Arabic name
quran_simple.json ~3 MB Full Quran text for matching
supported_reciters.txt List of all 335 supported reciters

Quick Start

Install

pip install transformers torch librosa rapidfuzz huggingface_hub

Transcription only

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa, torch

MODEL = "rakansuliman/tadabur-whisper-medium"
processor = WhisperProcessor.from_pretrained(MODEL)
model = WhisperForConditionalGeneration.from_pretrained(MODEL)
model.eval()

audio, _ = librosa.load("recitation.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

with torch.no_grad():
    ids = model.generate(
        inputs,
        language="arabic",
        task="transcribe",
        max_new_tokens=225,
        suppress_tokens=[],
        forced_decoder_ids=None,
    )

print(processor.batch_decode(ids, skip_special_tokens=True)[0])

Full pipeline (transcription + reciter)

from huggingface_hub import hf_hub_download
import torch, torch.nn as nn, json

MODEL = "rakansuliman/tadabur-whisper-medium"

# Download classifier files
hf_hub_download(MODEL, "reciter_classifier.pt",  local_dir="./")
hf_hub_download(MODEL, "reciter_idx_to_id.json", local_dir="./")
hf_hub_download(MODEL, "sheikh_dict.json",        local_dir="./")

# Define classifier (must match training architecture)
class ReciterClassifier(nn.Module):
    def __init__(self, hidden_dim, num_classes):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(hidden_dim, 512), nn.BatchNorm1d(512), nn.ReLU(), nn.Dropout(0.3),
            nn.Linear(512, 256), nn.BatchNorm1d(256), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(256, num_classes),
        )
    def forward(self, x): return self.net(x)

# Load mappings
with open("reciter_idx_to_id.json") as f:
    idx_to_id = {int(k): int(v) for k, v in json.load(f).items()}
with open("sheikh_dict.json", encoding="utf-8-sig") as f:
    sheikh = {int(v): k for k, v in json.load(f).items()}

# Load classifier
clf = ReciterClassifier(1024, len(idx_to_id))
clf.load_state_dict(torch.load("reciter_classifier.pt", map_location="cpu"))
clf.eval()

# Run encoder + classify reciter
with torch.no_grad():
    encoder_out = model.model.encoder(inputs)
    embedding   = encoder_out.last_hidden_state.mean(dim=1).float()
    logits      = clf(embedding)
    pred_idx    = logits.argmax(dim=1).item()
    confidence  = torch.softmax(logits, dim=1).max().item()

reciter_id   = idx_to_id[pred_idx]
reciter_name = sheikh.get(reciter_id, f"ID {reciter_id}")
print(f"Reciter: {reciter_name} ({confidence*100:.1f}%)")

Architecture

Audio Input (mic / file / video)
    ↓
Whisper Encoder  ←─ runs once, shared
    ├── Whisper Decoder  →  Arabic text
    └── MLP Classifier   →  Reciter name
    ↓
RapidFuzz matching against 6,236 ayahs
    ↓
Surah name + Ayah number + confidence

Reciter Classifier Architecture

Linear(1024→512) → BatchNorm → ReLU → Dropout(0.3)
    → Linear(512→256) → BatchNorm → ReLU → Dropout(0.2)
    → Linear(256→335)

Training Details

ASR Fine-tuning

  • Base model: openai/whisper-medium
  • Dataset: 9,432 samples (1 shard of Tadabur)
  • Hardware: NVIDIA RTX 4090 (24GB VRAM)
  • Batch size: 8 × 4 gradient accumulation = 32 effective
  • Learning rate: 1e-5 cosine with 500 warmup steps
  • Precision: FP16
  • Best checkpoint: step 10,000

Reciter Classifier

  • Training data: 500 shards (~325k samples, 335 reciters)
  • Phase 1: Extract Whisper encoder embeddings shard-by-shard
  • Phase 2: Train MLP on pre-extracted embeddings (15 min)
  • Optimizer: AdamW with cosine annealing
  • Epochs: 20, Batch size: 256

Supported Reciters

See supported_reciters.txt for the full list of 335 supported reciters including: عبد الباسط عبد الصمد، محمد صديق المنشاوي، ياسر الدوسري، سعود الشريم، ماهر المعيقلي، عبدالرحمن السديس، and 329 more.


Limitations

  • ASR trained on 1 shard only — may have reduced generalization on rare recitation styles
  • Reciter classifier covers 335 of 671 total reciters in the dataset
  • Surah/Ayah matching accuracy depends on transcription quality
  • Model optimized for standard Hafs recitation style

Citation

@misc{suliman2026tadabur,
  author = {Suliman, Rakan and Mamdoh, Abdulrahman and Aldosari, Hussam and Khalid, Mohammed},
  title  = {Tadabur: Quran ASR with Surah/Ayah Identification and Reciter Recognition},
  year   = {2026},
  url    = {https://huggingface.co/rakansuliman/tadabur-whisper-medium}
}

License

CC BY-NC 4.0 — Research and educational use only. Please engage with Quran content respectfully. 🤲

Downloads last month
58
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rakansuliman/tadabur-whisper-medium

Finetuned
(881)
this model

Dataset used to train rakansuliman/tadabur-whisper-medium

Space using rakansuliman/tadabur-whisper-medium 1

Evaluation results

Free AI Image Generator No sign-up. Instant results. Open Now