library_name: keras
tags:
- SpeakerRecognition
- Fast Fourier Transform (FFT)
- Convnet
- speech-recordings
- SpeechClassification
Model description
This model helps to classify speakers from the frequency domain representation of speech recordings, obtained via Fast Fourier Transform (FFT). The model is created by a 1D convolutional network with residual connections for audio classification.
This repo contains the model for the notebook Speaker Recognition.
Full credits go to Fadi Badine
Dataset Used
This model uses a speaker recognition dataset of Kaggle
Intended uses & limitations
This should be run with TensorFlow 2.3 or higher, or tf-nightly.
Also, The noise samples in the dataset need to be resampled to a sampling rate of 16000 Hz before using for this model so, In order to do this, you will need to have installed ffmpg.
Training and evaluation data
During dataset preparation, the speech samples & background noise samples were sorted and categorized into 2 folders - audio & noise, and then noise samples were resampled to 16000Hz & then the background noise was added to the speech samples to augment the data. After that, the FFT of these samples was given to the model for the training & evaluation part.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
| name | learning_rate | decay | beta_1 | beta_2 | epsilon | amsgrad | training_precision |
|---|---|---|---|---|---|---|---|
| Adam | 0.0010000000474974513 | 0.0 | 0.8999999761581421 | 0.9990000128746033 | 1e-07 | False | float32 |
Training Metrics
Model history needed
