w2v2BERT-CZ-CV-17.0 / README.md
mitkaj's picture
Upload fine-tuned Wav2Vec2BERT CTC model for Czech ASR
637b1de verified
|
raw
history blame
1.59 kB
---
language:
- cs
- en
tags:
- audio
- automatic-speech-recognition
- ctc
- wav2vec2-bert
- czech
license: mit
datasets:
- common-voice
metric:
- wer
---
# mitkaj/w2v2BERT-CZ-CV-17.0
This is a fine-tuned Wav2Vec2BERT model for Czech Automatic Speech Recognition (ASR) using CTC loss.
## Model Details
- **Base Model**: facebook/w2v-bert-2.0
- **Architecture**: Wav2Vec2BertForCTC
- **Training**: Fine-tuned on Czech Common Voice dataset
- **Loss Function**: CTC (Connectionist Temporal Classification)
- **Vocab Size**: 51 tokens
## Training Summary
- **Training Epochs**: 19.97
- **Final Training Loss**: 0.0305
- **Final Evaluation Loss**: 0.1450
- **Final WER**: 0.0583 (5.83%)
- **Total Training Time**: 5.1 hours
- **Total FLOPS**: 79819834495052513280 GF
## Usage
```python
from transformers import AutoProcessor, AutoModelForCTC
import torch
# Load model and processor
processor = AutoProcessor.from_pretrained("mitkaj/w2v2BERT-CZ-CV-17.0")
model = AutoModelForCTC.from_pretrained("mitkaj/w2v2BERT-CZ-CV-17.0")
# Process audio
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Get logits
with torch.no_grad():
logits = model(**inputs).logits
# Decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
```
## Training
This model was trained using the CTC approach on Czech speech data.
## Performance
The model was evaluated on Czech test data using WER (Word Error Rate) metric.
## Citation
If you use this model, please cite the original Wav2Vec2BERT paper and this fine-tuned version.