skit-ai
/

speechllm-2B

Feature Extraction

speech-language

Model card Files Files and versions

shangeth commited on Jun 4, 2024

Commit

60d6884

·

verified ·

1 Parent(s): 183a43c

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -58,12 +58,12 @@ model-index:
 # SpeechLLM
 SpeechLLM is a multi-modal LLM trained to predict the metadata of the speaker's turn in a conversation. SpeechLLM model is based on HubertX acoustic encoder and TinyLlama LLM. The model predicts the following:
-1. Speech Activity
-2. ASR Transcript
-3. Gender of the speaker
-4. Age of the speaker
-5. Accent of the speaker
-6. Emotion of the speaker
 ## Usage
 ```python

 # SpeechLLM
 SpeechLLM is a multi-modal LLM trained to predict the metadata of the speaker's turn in a conversation. SpeechLLM model is based on HubertX acoustic encoder and TinyLlama LLM. The model predicts the following:
+1. **SpeechActivity** : if the audio signal contains speech (True/False)
+2. **Transcript** : ASR transcript of the audio
+3. **Gender** of the speaker (Female/Male)
+4. **Age** of the speaker (Young/Middle-Age/Senior)
+5. **Accent** of the speaker (Africa/America/Celtic/Europe/Oceania/South-Asia/South-East-Asia)
+6. **Emotion** of the speaker (Happy/Sad/Anger/Neutral/Frustrated)
 ## Usage
 ```python