WhisperD-NL: Fine-tuned Whisper for Dutch Speech Recognition

WhisperD-NL is a fine-tuned Whisper model trained on the Corpus Gesproken Nederlands (CGN) specifically to detect disfluencies, speakers and non-speech events.

Model Details

  • Base Model: openai/whisper-large-v3
  • Language: Dutch (nl)
  • Task: Automatic Speech Recognition
  • Fine-tuning: Corpus Gesproken Nederlands (CGN)
  • Speaker Identification: Speaker identification is implemented up to four different speakers via a tag ([S1], [S2], [S3] and [S4])
  • WER: 16.42 for disfluencies, speaker identification and non-speech events based on whisper-large-v3

Usage

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torch
import soundfile as sf

# Load model and processor
processor = AutoProcessor.from_pretrained("pevers/whisperd-nl")
model = AutoModelForSpeechSeq2Seq.from_pretrained("pevers/whisperd-nl")

# Load and preprocess audio
audio, sr = sf.read("path_to_dutch_audio.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")

# Generate transcription
with torch.no_grad():
    predicted_ids = model.generate(inputs.input_features)
    
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Limitations

  • Optimized specifically for Dutch language with disfluencies and non-speech events
  • Inherits limitations from the base Whisper model
Downloads last month
217
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pevers/whisperd-nl

Finetuned
(765)
this model