|
--- |
|
language: |
|
- en |
|
license: mit |
|
tags: |
|
- audio |
|
- speech |
|
- emotion-recognition |
|
- tensorflow |
|
- keras |
|
- audio-classification |
|
- ravdess |
|
datasets: |
|
- ravdess |
|
metrics: |
|
- accuracy |
|
- f1 |
|
model-index: |
|
- name: Speech Emotion Recognition |
|
results: |
|
- task: |
|
type: audio-classification |
|
name: Audio Classification |
|
dataset: |
|
type: ravdess |
|
name: RAVDESS |
|
metrics: |
|
- type: accuracy |
|
name: Accuracy |
|
value: "See confusion matrix" |
|
pipeline_tag: audio-classification |
|
library_name: tensorflow |
|
--- |
|
|
|
# Speech Emotion Recognition Model |
|
|
|
This model performs speech emotion recognition, classifying audio into 8 different emotional states. |
|
|
|
## Model Description |
|
|
|
This is a deep learning model trained to recognize emotions from speech audio. The model can classify audio into the following emotions: |
|
|
|
- 😐 Neutral |
|
- 😌 Calm |
|
- 😊 Happy |
|
- 😢 Sad |
|
- 😠 Angry |
|
- 😨 Fearful |
|
- 🤢 Disgust |
|
- 😲 Surprised |
|
|
|
## Model Architecture |
|
|
|
The model uses audio features extraction including: |
|
- MFCC (Mel-frequency cepstral coefficients) |
|
- Chroma features |
|
- Mel-spectrogram features |
|
|
|
## Usage |
|
|
|
```python |
|
import librosa |
|
import numpy as np |
|
from tensorflow.keras.models import load_model |
|
|
|
# Load the model |
|
model = load_model('trained_model.h5') |
|
|
|
# Load and preprocess audio |
|
def extract_feature(data, sr, mfcc=True, chroma=True, mel=True): |
|
result = np.array([]) |
|
if mfcc: |
|
mfccs = np.mean(librosa.feature.mfcc(y=data, sr=sr, n_mfcc=40).T, axis=0) |
|
result = np.hstack((result, mfccs)) |
|
if chroma: |
|
stft = np.abs(librosa.stft(data)) |
|
chroma_feat = np.mean(librosa.feature.chroma_stft(S=stft, sr=sr).T, axis=0) |
|
result = np.hstack((result, chroma_feat)) |
|
if mel: |
|
mel_feat = np.mean(librosa.feature.melspectrogram(y=data, sr=sr).T, axis=0) |
|
result = np.hstack((result, mel_feat)) |
|
return result |
|
|
|
# Load audio file |
|
audio_path = "your_audio_file.wav" |
|
data, sr = librosa.load(audio_path, sr=22050) |
|
|
|
# Extract features |
|
feature = extract_feature(data, sr, mfcc=True, chroma=True, mel=True) |
|
feature = np.expand_dims(feature, axis=0) |
|
feature = np.expand_dims(feature, axis=2) |
|
|
|
# Make prediction |
|
prediction = model.predict(feature) |
|
predicted_class = np.argmax(prediction, axis=1) |
|
|
|
# Map to emotion labels |
|
emotions = { |
|
0: 'Neutral', |
|
1: 'Calm', |
|
2: 'Happy', |
|
3: 'Sad', |
|
4: 'Angry', |
|
5: 'Fearful', |
|
6: 'Disgust', |
|
7: 'Surprised' |
|
} |
|
|
|
predicted_emotion = emotions[predicted_class[0]] |
|
print(f"Predicted emotion: {predicted_emotion}") |
|
``` |
|
|
|
## Requirements |
|
|
|
``` |
|
librosa |
|
tensorflow |
|
numpy |
|
scikit-learn |
|
``` |
|
|
|
## Training Data |
|
|
|
The model was trained on the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset, which contains speech emotion recordings with the following emotion categories: |
|
|
|
- Neutral |
|
- Calm |
|
- Happy |
|
- Sad |
|
- Angry |
|
- Fearful |
|
- Disgust |
|
- Surprised |
|
|
|
The dataset provides high-quality audio recordings from multiple speakers, allowing the model to learn robust emotion recognition patterns across different voices and speaking styles. |
|
|
|
## Model Performance |
|
|
|
The model has been trained and evaluated with the following performance metrics: |
|
|
|
### Training Progress |
|
 |
|
|
|
The training curves show the model's learning progress over epochs, demonstrating convergence and good generalization. |
|
|
|
### Confusion Matrix |
|
 |
|
|
|
The confusion matrix shows the model's performance on the RAVDESS dataset, demonstrating how well the model distinguishes between different emotional states. |
|
|
|
## License |
|
|
|
[Specify your license here] |
|
|
|
## Citation |
|
|
|
If you use this model, please cite: |
|
|
|
``` |
|
@misc{speech-emotion-recognition, |
|
author = {JagjeevanAK}, |
|
title = {Speech Emotion Recognition Model}, |
|
year = {2025}, |
|
publisher = {Hugging Face}, |
|
url = {https://huggingface.co/JagjeevanAK/Speech-emotion-detection} |
|
} |
|
``` |
|
|