Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ base_model: facebook/hf-seamless-m4t-medium
|
|
17 |
|
18 |
This is a **SeamlessBasic** model that processes audio and text inputs to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment.
|
19 |
|
20 |
-
The model is built on top of Meta's SeamlessM4T and fine-tuned on a multimodal dataset containing audio-subtitle pairs with editing time annotations
|
21 |
|
22 |
### Key Features
|
23 |
|
@@ -164,7 +164,7 @@ data = [
|
|
164 |
|
165 |
The model was trained with the following specifications:
|
166 |
|
167 |
-
- **Dataset**: Multimodal audio-subtitle pairs with TTE annotations
|
168 |
- **Train/Test Split**: 80/20 with random seed 42
|
169 |
- **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
|
170 |
- **Text Processing**: Max 256 tokens
|
|
|
17 |
|
18 |
This is a **SeamlessBasic** model that processes audio and text inputs to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment.
|
19 |
|
20 |
+
The model is built on top of Meta's SeamlessM4T and fine-tuned on a multimodal dataset containing audio-subtitle pairs with editing time annotations across 5 languages: **English, French, Spanish, Italian, and German**.
|
21 |
|
22 |
### Key Features
|
23 |
|
|
|
164 |
|
165 |
The model was trained with the following specifications:
|
166 |
|
167 |
+
- **Dataset**: Multimodal audio-subtitle pairs with TTE annotations (5 languages: EN, FR, ES, IT, DE)
|
168 |
- **Train/Test Split**: 80/20 with random seed 42
|
169 |
- **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
|
170 |
- **Text Processing**: Max 256 tokens
|