videoloc
/

seamless-langpairs

@@ -19,7 +19,7 @@ base_model: facebook/hf-seamless-m4t-medium
 This is a **SeamlessLanguagePairs** model that processes audio and text inputs with both translation awareness and language pair embeddings to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment, taking into account both whether the subtitle is translated and the specific language pair involved.
-The model extends the SeamlessM4T architecture with both translation features and language pair embeddings, providing the most granular control for multilingual video localization scenarios with support for 21 different language pairs.
 ### Key Features
@@ -137,6 +137,14 @@ print(f"Predicted Time To Edit (TTE): {tte_prediction:.2f} seconds")
 - **Output**: Single regression value (TTE in seconds)
 - **Task**: Subtitle editing time prediction
 ## Data Format
 Your input data should be a list of dictionaries with:
@@ -193,12 +201,12 @@ data = [
 The model was trained with the following specifications:
-- **Dataset**: Multimodal audio-subtitle pairs with translation and language pair annotations
 - **Train/Test Split**: 80/20 with random seed 42
 - **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
 - **Text Processing**: Max 256 tokens
 - **Translation Feature**: Binary flag indicating original vs translated content
-- **Language Pairs**: 21 most frequent language pairs plus "other" category
 - **Normalization**: None (raw TTE values in seconds)
 - **Caching**: Audio segments cached and compressed for efficiency

 This is a **SeamlessLanguagePairs** model that processes audio and text inputs with both translation awareness and language pair embeddings to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment, taking into account both whether the subtitle is translated and the specific language pair involved.
+The model extends the SeamlessM4T architecture with both translation features and language pair embeddings, providing the most granular control for multilingual scenarios across **5 languages: English, French, Spanish, Italian, and German** with **21 different translation pairs** between them (e.g., EN→FR, ES→DE, IT→EN, etc.).
 ### Key Features
 - **Output**: Single regression value (TTE in seconds)
 - **Task**: Subtitle editing time prediction
+## Supported Language Pairs
+The model supports 21 specific translation pairs between 5 languages:
+**Languages**: English (EN), French (FR), Spanish (ES), Italian (IT), German (DE)
+**Translation Pairs**: All combinations between the 5 languages create various directional pairs (e.g., EN→FR, FR→EN, ES→IT, DE→ES, etc.). The model uses language pair IDs (0-20) to identify specific translation directions, with ID 21 reserved for "other" pairs.
 ## Data Format
 Your input data should be a list of dictionaries with:
 The model was trained with the following specifications:
+- **Dataset**: Multimodal audio-subtitle pairs with translation and language pair annotations (5 languages: EN, FR, ES, IT, DE with 21 pairs)
 - **Train/Test Split**: 80/20 with random seed 42
 - **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
 - **Text Processing**: Max 256 tokens
 - **Translation Feature**: Binary flag indicating original vs translated content
+- **Language Pairs**: 21 translation pairs from 5 languages (EN, FR, ES, IT, DE) plus "other" category
 - **Normalization**: None (raw TTE values in seconds)
 - **Caching**: Audio segments cached and compressed for efficiency