EgypTalk-ASR-v2

image/png

NAMAA-Space/EgypTalk-ASR-v2 is a high-performance automatic speech recognition (ASR) model for Egyptian Arabic, trained using NVIDIA NeMo and optimized for real-world speech from native Egyptian speakers.

The model was trained on over 200 hours of high-quality, manually curated audio data collected and prepared by the NAMAA team. It is built upon NVIDIA’s FastConformer Hybrid Large architecture and fine-tuned for Egyptian Arabic, enabling highly accurate transcription in casual, formal, and mixed dialect settings.

Demo: Try it here

🗣️ Model Description

  • Architecture: FastConformer Hybrid Large from NVIDIA NeMo ASR collection.
  • Framework: PyTorch Lightning + NVIDIA NeMo.
  • Languages: Egyptian Arabic (with capability to handle some Modern Standard Arabic).
  • Dataset: 200+ hours of proprietary, high-quality audio for Egyptian Arabic, covering:
    • Spontaneous conversation
    • Broadcast media
    • Interviews
    • Read speech
  • Tokenizer: SentencePiece (trained specifically for Egyptian Arabic phonetic coverage).
  • Input Format: 16 kHz mono WAV files.
  • Output: Raw transcribed text in Arabic.

🚀 Key Features

  • Egyptian Arabic Dialect Optimized – Designed to handle local pronunciations, colloquialisms, and speech patterns.
  • High Accuracy – Achieves strong WER performance on Egyptian test sets.
  • FastConformer Efficiency – Low-latency, streaming-capable inference.
  • Robust Dataset – Covers multiple domains (media, conversation, formal speech).

💻 Usage

import torch
from nemo.collections.asr.models import ASRModel

# Load model
model = ASRModel.from_pretrained("NAMAA-Space/EgypTalk-ASR-v2")

# Transcribe audio
transcription = model.transcribe(["sample.wav"])
print(transcription)

🛠️ Training Details

  • Pretrained Base Model: nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0
  • Training Framework: PyTorch Lightning (DDP strategy)
  • Training Duration: 100 epochs, mixed precision enabled
  • Optimizer: Adam with learning rate 1e-3
  • Batch Size: 32 (train) / 8 (validation, test)
  • Augmentations: Silence trimming, start/end token usage

Citation

@misc{,
  title={NAMAA-Space/EgypTalk-ASR-v2},
  author={NAMAA},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/NAMAA-Space/NAMAA-Space/EgypTalk-ASR-v2}},
  note={Accessed: 2025-03-02}
}
Downloads last month
158
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NAMAA-Space/EgypTalk-ASR-v2

Finetuned
(1)
this model

Spaces using NAMAA-Space/EgypTalk-ASR-v2 2