giuseppe-tanzi commited on
Commit
8ad932f
·
verified ·
1 Parent(s): 4fe5215

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +11 -3
README.md CHANGED
@@ -19,7 +19,7 @@ base_model: facebook/hf-seamless-m4t-medium
19
 
20
  This is a **SeamlessLanguagePairs** model that processes audio and text inputs with both translation awareness and language pair embeddings to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment, taking into account both whether the subtitle is translated and the specific language pair involved.
21
 
22
- The model extends the SeamlessM4T architecture with both translation features and language pair embeddings, providing the most granular control for multilingual video localization scenarios with support for 21 different language pairs.
23
 
24
  ### Key Features
25
 
@@ -137,6 +137,14 @@ print(f"Predicted Time To Edit (TTE): {tte_prediction:.2f} seconds")
137
  - **Output**: Single regression value (TTE in seconds)
138
  - **Task**: Subtitle editing time prediction
139
 
 
 
 
 
 
 
 
 
140
  ## Data Format
141
 
142
  Your input data should be a list of dictionaries with:
@@ -193,12 +201,12 @@ data = [
193
 
194
  The model was trained with the following specifications:
195
 
196
- - **Dataset**: Multimodal audio-subtitle pairs with translation and language pair annotations
197
  - **Train/Test Split**: 80/20 with random seed 42
198
  - **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
199
  - **Text Processing**: Max 256 tokens
200
  - **Translation Feature**: Binary flag indicating original vs translated content
201
- - **Language Pairs**: 21 most frequent language pairs plus "other" category
202
  - **Normalization**: None (raw TTE values in seconds)
203
  - **Caching**: Audio segments cached and compressed for efficiency
204
 
 
19
 
20
  This is a **SeamlessLanguagePairs** model that processes audio and text inputs with both translation awareness and language pair embeddings to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment, taking into account both whether the subtitle is translated and the specific language pair involved.
21
 
22
+ The model extends the SeamlessM4T architecture with both translation features and language pair embeddings, providing the most granular control for multilingual scenarios across **5 languages: English, French, Spanish, Italian, and German** with **21 different translation pairs** between them (e.g., EN→FR, ES→DE, IT→EN, etc.).
23
 
24
  ### Key Features
25
 
 
137
  - **Output**: Single regression value (TTE in seconds)
138
  - **Task**: Subtitle editing time prediction
139
 
140
+ ## Supported Language Pairs
141
+
142
+ The model supports 21 specific translation pairs between 5 languages:
143
+
144
+ **Languages**: English (EN), French (FR), Spanish (ES), Italian (IT), German (DE)
145
+
146
+ **Translation Pairs**: All combinations between the 5 languages create various directional pairs (e.g., EN→FR, FR→EN, ES→IT, DE→ES, etc.). The model uses language pair IDs (0-20) to identify specific translation directions, with ID 21 reserved for "other" pairs.
147
+
148
  ## Data Format
149
 
150
  Your input data should be a list of dictionaries with:
 
201
 
202
  The model was trained with the following specifications:
203
 
204
+ - **Dataset**: Multimodal audio-subtitle pairs with translation and language pair annotations (5 languages: EN, FR, ES, IT, DE with 21 pairs)
205
  - **Train/Test Split**: 80/20 with random seed 42
206
  - **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
207
  - **Text Processing**: Max 256 tokens
208
  - **Translation Feature**: Binary flag indicating original vs translated content
209
+ - **Language Pairs**: 21 translation pairs from 5 languages (EN, FR, ES, IT, DE) plus "other" category
210
  - **Normalization**: None (raw TTE values in seconds)
211
  - **Caching**: Audio segments cached and compressed for efficiency
212