Update README.md
Browse files
README.md
CHANGED
|
@@ -58,12 +58,12 @@ model-index:
|
|
| 58 |
# SpeechLLM
|
| 59 |
|
| 60 |
SpeechLLM is a multi-modal LLM trained to predict the metadata of the speaker's turn in a conversation. SpeechLLM model is based on HubertX acoustic encoder and TinyLlama LLM. The model predicts the following:
|
| 61 |
-
1.
|
| 62 |
-
2. ASR
|
| 63 |
-
3. Gender of the speaker
|
| 64 |
-
4. Age of the speaker
|
| 65 |
-
5. Accent of the speaker
|
| 66 |
-
6. Emotion of the speaker
|
| 67 |
|
| 68 |
## Usage
|
| 69 |
```python
|
|
|
|
| 58 |
# SpeechLLM
|
| 59 |
|
| 60 |
SpeechLLM is a multi-modal LLM trained to predict the metadata of the speaker's turn in a conversation. SpeechLLM model is based on HubertX acoustic encoder and TinyLlama LLM. The model predicts the following:
|
| 61 |
+
1. **SpeechActivity** : if the audio signal contains speech (True/False)
|
| 62 |
+
2. **Transcript** : ASR transcript of the audio
|
| 63 |
+
3. **Gender** of the speaker (Female/Male)
|
| 64 |
+
4. **Age** of the speaker (Young/Middle-Age/Senior)
|
| 65 |
+
5. **Accent** of the speaker (Africa/America/Celtic/Europe/Oceania/South-Asia/South-East-Asia)
|
| 66 |
+
6. **Emotion** of the speaker (Happy/Sad/Anger/Neutral/Frustrated)
|
| 67 |
|
| 68 |
## Usage
|
| 69 |
```python
|