Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ tags:
|
|
8 |
- seamless
|
9 |
- subtitle-editing-time-prediction
|
10 |
library_name: transformers
|
11 |
-
|
12 |
---
|
13 |
|
14 |
# videoloc/seamless-basic
|
@@ -24,7 +24,6 @@ The model is built on top of Meta's SeamlessM4T and fine-tuned on a multimodal d
|
|
24 |
- **Multimodal Processing**: Simultaneously processes audio (16kHz) and text inputs
|
25 |
- **Frozen Encoders**: Uses pre-trained SeamlessM4T encoders (frozen for stability)
|
26 |
- **TTE Prediction**: Predicts editing time required for subtitle segments
|
27 |
-
- **Efficient Architecture**: Optimized for inference with gradient checkpointing support
|
28 |
- **Direct Output**: Raw time values in seconds for immediate use
|
29 |
|
30 |
## Model Architecture
|
@@ -156,8 +155,6 @@ data = [
|
|
156 |
- **Dataset Split**: 80/20 train/test
|
157 |
- **Random Seed**: 42
|
158 |
- **Metric**: RMSE (lower is better)
|
159 |
-
- **Audio Caching**: Enabled with compression
|
160 |
-
- **Workers**: 8
|
161 |
|
162 |
## Training Configuration
|
163 |
|
|
|
8 |
- seamless
|
9 |
- subtitle-editing-time-prediction
|
10 |
library_name: transformers
|
11 |
+
base_model: facebook/hf-seamless-m4t-medium
|
12 |
---
|
13 |
|
14 |
# videoloc/seamless-basic
|
|
|
24 |
- **Multimodal Processing**: Simultaneously processes audio (16kHz) and text inputs
|
25 |
- **Frozen Encoders**: Uses pre-trained SeamlessM4T encoders (frozen for stability)
|
26 |
- **TTE Prediction**: Predicts editing time required for subtitle segments
|
|
|
27 |
- **Direct Output**: Raw time values in seconds for immediate use
|
28 |
|
29 |
## Model Architecture
|
|
|
155 |
- **Dataset Split**: 80/20 train/test
|
156 |
- **Random Seed**: 42
|
157 |
- **Metric**: RMSE (lower is better)
|
|
|
|
|
158 |
|
159 |
## Training Configuration
|
160 |
|