videoloc
/

seamless-basic

subtitle-editing-time-prediction

Model card Files Files and versions

giuseppe-tanzi commited on Jun 16

Commit

a6453d8

·

verified ·

1 Parent(s): 14dc36f

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ base_model: facebook/hf-seamless-m4t-medium
 This is a **SeamlessBasic** model that processes audio and text inputs to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment.
-The model is built on top of Meta's SeamlessM4T and fine-tuned on a multimodal dataset containing audio-subtitle pairs with editing time annotations.
 ### Key Features
@@ -164,7 +164,7 @@ data = [
 The model was trained with the following specifications:
-- **Dataset**: Multimodal audio-subtitle pairs with TTE annotations
 - **Train/Test Split**: 80/20 with random seed 42
 - **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
 - **Text Processing**: Max 256 tokens

 This is a **SeamlessBasic** model that processes audio and text inputs to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment.
+The model is built on top of Meta's SeamlessM4T and fine-tuned on a multimodal dataset containing audio-subtitle pairs with editing time annotations across 5 languages: **English, French, Spanish, Italian, and German**.
 ### Key Features
 The model was trained with the following specifications:
+- **Dataset**: Multimodal audio-subtitle pairs with TTE annotations (5 languages: EN, FR, ES, IT, DE)
 - **Train/Test Split**: 80/20 with random seed 42
 - **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
 - **Text Processing**: Max 256 tokens