WhisperD-NL: Fine-tuned Whisper for Dutch Speech Recognition

WhisperD-NL is a fine-tuned Whisper model trained on the Corpus Gesproken Nederlands (CGN) specifically to detect disfluencies, speakers and non-speech events.

Model Details

Base Model: openai/whisper-large-v3
Language: Dutch (nl)
Task: Automatic Speech Recognition
Fine-tuning: Corpus Gesproken Nederlands (CGN)
Speaker Identification: Speaker identification is implemented up to four different speakers via a tag ([S1], [S2], [S3] and [S4])
WER: 16.42 for disfluencies, speaker identification and non-speech events based on whisper-large-v3

Usage

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torch
import soundfile as sf

# Load model and processor
processor = AutoProcessor.from_pretrained("pevers/whisperd-nl")
model = AutoModelForSpeechSeq2Seq.from_pretrained("pevers/whisperd-nl")

# Load and preprocess audio
audio, sr = sf.read("path_to_dutch_audio.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")

# Generate transcription
with torch.no_grad():
    predicted_ids = model.generate(inputs.input_features)
    
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Limitations

Optimized specifically for Dutch language with disfluencies and non-speech events
Inherits limitations from the base Whisper model

Downloads last month: 217

Safetensors

Model size

2B params

Tensor type

F32

Model tree for pevers/whisperd-nl

Base model

openai/whisper-large-v3

Finetuned

(765)

this model

WhisperD-NL: Fine-tuned Whisper for Dutch Speech Recognition

Model Details

Usage

Limitations

Model tree for pevers/whisperd-nl

🎉 Free Image Generator Now Available!