Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -17,7 +17,7 @@ base_model: facebook/hf-seamless-m4t-medium
|
|
| 17 |
|
| 18 |
This is a **SeamlessBasic** model that processes audio and text inputs to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment.
|
| 19 |
|
| 20 |
-
The model is built on top of Meta's SeamlessM4T and fine-tuned on a multimodal dataset containing audio-subtitle pairs with editing time annotations
|
| 21 |
|
| 22 |
### Key Features
|
| 23 |
|
|
@@ -164,7 +164,7 @@ data = [
|
|
| 164 |
|
| 165 |
The model was trained with the following specifications:
|
| 166 |
|
| 167 |
-
- **Dataset**: Multimodal audio-subtitle pairs with TTE annotations
|
| 168 |
- **Train/Test Split**: 80/20 with random seed 42
|
| 169 |
- **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
|
| 170 |
- **Text Processing**: Max 256 tokens
|
|
|
|
| 17 |
|
| 18 |
This is a **SeamlessBasic** model that processes audio and text inputs to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment.
|
| 19 |
|
| 20 |
+
The model is built on top of Meta's SeamlessM4T and fine-tuned on a multimodal dataset containing audio-subtitle pairs with editing time annotations across 5 languages: **English, French, Spanish, Italian, and German**.
|
| 21 |
|
| 22 |
### Key Features
|
| 23 |
|
|
|
|
| 164 |
|
| 165 |
The model was trained with the following specifications:
|
| 166 |
|
| 167 |
+
- **Dataset**: Multimodal audio-subtitle pairs with TTE annotations (5 languages: EN, FR, ES, IT, DE)
|
| 168 |
- **Train/Test Split**: 80/20 with random seed 42
|
| 169 |
- **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
|
| 170 |
- **Text Processing**: Max 256 tokens
|