Upload README.md with huggingface_hub

bd3419d verified 2 months ago

4.02 kB

	---
	language:
	- en
	license: mit
	tags:
	- audio
	- speech
	- emotion-recognition
	- tensorflow
	- keras
	- audio-classification
	- ravdess
	datasets:
	- ravdess
	metrics:
	- accuracy
	- f1
	model-index:
	- name: Speech Emotion Recognition
	results:
	- task:
	type: audio-classification
	name: Audio Classification
	dataset:
	type: ravdess
	name: RAVDESS
	metrics:
	- type: accuracy
	name: Accuracy
	value: "See confusion matrix"
	pipeline_tag: audio-classification
	library_name: tensorflow
	---

	# Speech Emotion Recognition Model

	This model performs speech emotion recognition, classifying audio into 8 different emotional states.

	## Model Description

	This is a deep learning model trained to recognize emotions from speech audio. The model can classify audio into the following emotions:

	- 😐 Neutral
	- 😌 Calm
	- 😊 Happy
	- 😢 Sad
	- 😠 Angry
	- 😨 Fearful
	- 🤢 Disgust
	- 😲 Surprised

	## Model Architecture

	The model uses audio features extraction including:
	- MFCC (Mel-frequency cepstral coefficients)
	- Chroma features
	- Mel-spectrogram features

	## Usage

	```python
	import librosa
	import numpy as np
	from tensorflow.keras.models import load_model

	# Load the model
	model = load_model('trained_model.h5')

	# Load and preprocess audio
	def extract_feature(data, sr, mfcc=True, chroma=True, mel=True):
	result = np.array([])
	if mfcc:
	mfccs = np.mean(librosa.feature.mfcc(y=data, sr=sr, n_mfcc=40).T, axis=0)
	result = np.hstack((result, mfccs))
	if chroma:
	stft = np.abs(librosa.stft(data))
	chroma_feat = np.mean(librosa.feature.chroma_stft(S=stft, sr=sr).T, axis=0)
	result = np.hstack((result, chroma_feat))
	if mel:
	mel_feat = np.mean(librosa.feature.melspectrogram(y=data, sr=sr).T, axis=0)
	result = np.hstack((result, mel_feat))
	return result

	# Load audio file
	audio_path = "your_audio_file.wav"
	data, sr = librosa.load(audio_path, sr=22050)

	# Extract features
	feature = extract_feature(data, sr, mfcc=True, chroma=True, mel=True)
	feature = np.expand_dims(feature, axis=0)
	feature = np.expand_dims(feature, axis=2)

	# Make prediction
	prediction = model.predict(feature)
	predicted_class = np.argmax(prediction, axis=1)

	# Map to emotion labels
	emotions = {
	0: 'Neutral',
	1: 'Calm',
	2: 'Happy',
	3: 'Sad',
	4: 'Angry',
	5: 'Fearful',
	6: 'Disgust',
	7: 'Surprised'
	}

	predicted_emotion = emotions[predicted_class[0]]
	print(f"Predicted emotion: {predicted_emotion}")
	```

	## Requirements

	```
	librosa
	tensorflow
	numpy
	scikit-learn
	```

	## Training Data

	The model was trained on the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset, which contains speech emotion recordings with the following emotion categories:

	- Neutral
	- Calm
	- Happy
	- Sad
	- Angry
	- Fearful
	- Disgust
	- Surprised

	The dataset provides high-quality audio recordings from multiple speakers, allowing the model to learn robust emotion recognition patterns across different voices and speaking styles.

	## Model Performance

	The model has been trained and evaluated with the following performance metrics:

	### Training Progress
	![Loss and Accuracy](loss%20and%20accuracy.png)

	The training curves show the model's learning progress over epochs, demonstrating convergence and good generalization.

	### Confusion Matrix
	![Confusion Matrix](Confusion-matrix-of-speaker-dependent-emotions-prediction-on-RAVDESS-corpus-with-8202.png)

	The confusion matrix shows the model's performance on the RAVDESS dataset, demonstrating how well the model distinguishes between different emotional states.

	## License

	[Specify your license here]

	## Citation

	If you use this model, please cite:

	```
	@misc{speech-emotion-recognition,
	author = {JagjeevanAK},
	title = {Speech Emotion Recognition Model},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/JagjeevanAK/Speech-emotion-detection}
	}
	```