metadata
license: cc-by-4.0
thumbnail: null
widget:
- example_title: Librispeech sample 1
src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
- example_title: Librispeech sample 2
src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
metrics:
- wer
tags:
- automatic-speech-recognition
- speech
- audio
- Transducer
- TDT
- FastConformer
- Conformer
- pytorch
- NeMo
- hf-asr-leaderboard
language:
- en
pipeline_tag: automatic-speech-recognition
library_name: nemo
base_model:
- nvidia/parakeet-tdt-0.6b-v2
Parakeet TDT 0.6B V2 - CoreML
This is a CoreML-optimized version of NVIDIA's Parakeet TDT 0.6B V2 model, designed for high-performance automatic speech recognition on Apple platforms.
Model Description
Models will continue to evolve as we optimize performance and accuracy. This model has been converted to CoreML format for efficient on-device inference on Apple Silicon and iOS devices, enabling real-time speech recognition with minimal memory footprint.
Usage in Swift
See the FluidAudio repository for instructions.
Performance
- Real-time factor: < 0.3x on M1 Pro
- Memory usage: ~800MB peak
- Supported platforms: macOS 14+, iOS 17+
- Optimized for: Apple Silicon
Model Details
- Architecture: FastConformer-TDT
- Parameters: 0.6B
- Sample rate: 16kHz
License
This model is released under the CC-BY-4.0 license. See the LICENSE file for details.
Acknowledgments
Based on NVIDIA's Parakeet TDT model. CoreML conversion and Swift integration by the FluidInference team.