diff --git a/README.md b/README.md
index 9bf15984fb25cf3a09c3a803e2b0cd8a68293f77..e4ac093c4a4ef046d967830511f628d0329e572b 100644
--- a/README.md
+++ b/README.md
@@ -2,91 +2,76 @@
license: apache-2.0
base_model: Qwen/Qwen2.5-7B
library_name: peft
+language:
+- fr
tags:
- text-to-speech
-- ssml
-- qwen2.5
- lora
- peft
-language:
-- en
-- fr
+- ssml
+- qwen2.5
pipeline_tag: text-generation
---
-# 🗣️ ssml-text2breaks-fr-lora
-
-**ssml-text2breaks-fr-lora** is a LoRA adapter built on top of `Qwen/Qwen2.5-7B`, trained to predict **symbolic pause markers** (e.g., `#250`, `#500`) in raw French text. These symbolic tags indicate appropriate prosodic boundaries for speech synthesis systems.
+# 🗣️ French Text-to-Breaks LoRA Model
-This model is the **first stage** in the cascaded pipeline presented in:
+**hi-paris/ssml-text2breaks-fr-lora** is a LoRA adapter fine-tuned on Qwen2.5-7B to predict natural pause locations in French text by adding symbolic `` markers.
-> **"Improving French Synthetic Speech Quality via SSML Prosody Control"**
-> *Nassima Ould-Ouali, Éric Moulines* – ICNLSP 2025 (*Springer LNCS*, accepted)
+This is the **first stage** of a two-step SSML cascade pipeline for improving French text-to-speech prosody control.
-It is designed to be followed by [`ssml-break2ssml-fr-lora`](https://huggingface.co/nassimaODL/ssml-break2ssml-fr-lora), which converts symbolic markers into valid SSML tags.
-
----
+> 📄 **Paper**: *"Improving Synthetic Speech Quality via SSML Prosody Control"*
+> **Authors**: Nassima Ould-Ouali, Awais Sani, Ruben Bueno, Jonah Dauvet, Tim Luka Horstmann, Eric Moulines
+> **Conference**: ICNLSP 2025
+> 🔗 **Demo & Audio Samples**: https://horstmann.tech/ssml-prosody-control/
## 🧩 Pipeline Overview
-| Stage | Model Name | Description |
-|-------|------------|-------------|
-| 1️⃣ | `ssml-text2breaks-fr-lora` | Predicts symbolic pause markers such as `#250`, `#500` |
-| 2️⃣ | `ssml-break2ssml-fr-lora` | Converts symbolic markers into `` SSML tags |
-
----
+| Stage | Model | Purpose |
+|-------|-------|---------|
+| 1️⃣ | **hi-paris/ssml-text2breaks-fr-lora** | Predicts natural pause locations |
+| 2️⃣ | [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora) | Converts breaks to full SSML with prosody |
## ✨ Example
**Input:**
-
-```text
-Bonjour je m'appelle Bertrand Perier. Je suis avocat à la cour.
-
```
-
-**Output**
-```text
-Bonjour#250 je m'appelle Bertrand Perier.#500 Je suis avocat à la cour.
-
+Bonjour comment allez-vous aujourd'hui ?
```
+**Output:**
+```
+Bonjour comment allez-vous aujourd'hui ?
+```
+## 🚀 Quick Start
+### Installation
-## 🧠 Model Details
-
-- **Base Model**: Qwen/Qwen2.5-7B
-- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
-- **LoRA Rank**: 8
-- **LoRA Alpha**: 16
-- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
-- **Training Epochs**: 5
-- **Batch Size**: 1 (with gradient accumulation)
-- **Learning Rate**: 3e-4
+```bash
+pip install torch transformers peft accelerate
+```
-## 🚀 How to run the code
+### Basic Usage
```python
-import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
+import torch
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-7B",
- torch_dtype=torch.bfloat16,
+ torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
# Load LoRA adapter
-model = PeftModel.from_pretrained(base_model, "jonahdvt/qwen-ssml-lora")
+model = PeftModel.from_pretrained(base_model, "hi-paris/ssml-text2breaks-fr-lora")
# Prepare input
-instruction = "Convert text to SSML with pauses:"
-text = "Hello, how are you today? I hope everything is going well."
-formatted_input = f"### Task:\n{instruction}\n\n### Text:\n{text}\n\n### SSML:\n"
+text = "Bonjour comment allez-vous aujourd'hui ?"
+formatted_input = f"### Task:\nConvert text to SSML with pauses:\n\n### Text:\n{text}\n\n### SSML:\n"
# Generate
inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)
@@ -100,25 +85,70 @@ with torch.no_grad():
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
-ssml_output = response.split("### SSML:\n")[-1]
-print(ssml_output)
+result = response.split("### SSML:\n")[-1].strip()
+print(result) # "Bonjour comment allez-vous aujourd'hui ?"
+```
+
+### Production Usage (Recommended)
+
+For production use with memory optimization and full cascade, see our [inference repository](https://github.com/TimLukaHorstmann/cascading_model):
+
+```python
+from text2breaks_inference import Text2BreaksInference
+
+# Memory-efficient shared model approach
+model = Text2BreaksInference()
+result = model.predict("Bonjour comment allez-vous aujourd'hui ?")
```
-## Citation
-If you use this model in your research, please cite:
-```text
+## 🔧 Full Cascade Example
+
+```python
+from breaks2ssml_inference import CascadedInference
+
+# Initialize full pipeline (memory efficient)
+cascade = CascadedInference()
+
+# Convert plain text directly to full SSML
+text = "Bonjour comment allez-vous aujourd'hui ?"
+ssml_output = cascade.predict(text)
+print(ssml_output)
+# Output: 'Bonjour comment allez-vous aujourd'hui ?'
+```
+
+## 🧠 Model Details
+
+- **Base Model**: [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B)
+- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
+- **LoRA Rank**: 8, Alpha: 16
+- **Target Modules**: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
+- **Training**: 5 epochs, batch size 1 with gradient accumulation
+- **Language**: French
+- **Model Size**: 7B parameters (LoRA adapter: ~81MB)
+- **License**: Apache 2.0
+
+## 📊 Performance
+
+The model achieves high accuracy in predicting natural pause locations in French text, contributing to improved prosody in text-to-speech synthesis when combined with the second-stage model.
+
+## 🔗 Resources
+
+- **Full Pipeline Code**: https://github.com/TimLukaHorstmann/cascading_model
+- **Interactive Demo**: [Colab Notebook](https://colab.research.google.com/drive/1bFcbJQY9OuY0_zlscqkf9PIgd3dUrIKs?usp=sharing)
+- **Stage 2 Model**: [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora)
+
+## 📖 Citation
+
+```bibtex
@inproceedings{ould-ouali2025_improving,
title = {Improving Synthetic Speech Quality via SSML Prosody Control},
author = {Ould-Ouali, Nassima and Sani, Awais and Bueno, Ruben and Dauvet, Jonah and Horstmann, Tim Luka and Moulines, Eric},
- booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)}, % TODO: vérifier l'intitulé exact utilisé par la conf
+ booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)},
year = {2025},
- pages = {XX--YY}, % TODO
- publisher = {—}, % TODO
- address = {—} % TODO
+ url = {https://huggingface.co/hi-paris}
}
```
+## 📜 License
-## License
-
-This model is released under the Apache 2.0 license, same as the base Qwen2.5-7B model.
+Apache 2.0 License (same as the base Qwen2.5-7B model)
diff --git a/added_tokens.json b/added_tokens.json
deleted file mode 100644
index 482ced4679301bf287ebb310bdd1790eb4514232..0000000000000000000000000000000000000000
--- a/added_tokens.json
+++ /dev/null
@@ -1,24 +0,0 @@
-{
- "": 151658,
- "": 151657,
- "<|box_end|>": 151649,
- "<|box_start|>": 151648,
- "<|endoftext|>": 151643,
- "<|file_sep|>": 151664,
- "<|fim_middle|>": 151660,
- "<|fim_pad|>": 151662,
- "<|fim_prefix|>": 151659,
- "<|fim_suffix|>": 151661,
- "<|im_end|>": 151645,
- "<|im_start|>": 151644,
- "<|image_pad|>": 151655,
- "<|object_ref_end|>": 151647,
- "<|object_ref_start|>": 151646,
- "<|quad_end|>": 151651,
- "<|quad_start|>": 151650,
- "<|repo_name|>": 151663,
- "<|video_pad|>": 151656,
- "<|vision_end|>": 151653,
- "<|vision_pad|>": 151654,
- "<|vision_start|>": 151652
-}
diff --git a/bigscience_T0_3B_ssml/added_tokens.json b/bigscience_T0_3B_ssml/added_tokens.json
deleted file mode 100644
index de0c64dcab6084255716c769d34fb186045bd8c4..0000000000000000000000000000000000000000
--- a/bigscience_T0_3B_ssml/added_tokens.json
+++ /dev/null
@@ -1,105 +0,0 @@
-{
- "": 32101,
- "": 32102,
- "": 32099,
- "": 32089,
- "": 32088,
- "": 32087,
- "": 32086,
- "": 32085,
- "": 32084,
- "": 32083,
- "": 32082,
- "": 32081,
- "": 32080,
- "": 32098,
- "": 32079,
- "": 32078,
- "": 32077,
- "": 32076,
- "": 32075,
- "": 32074,
- "": 32073,
- "": 32072,
- "": 32071,
- "": 32070,
- "": 32097,
- "": 32069,
- "": 32068,
- "": 32067,
- "": 32066,
- "": 32065,
- "": 32064,
- "": 32063,
- "": 32062,
- "": 32061,
- "": 32060,
- "": 32096,
- "": 32059,
- "": 32058,
- "": 32057,
- "": 32056,
- "": 32055,
- "": 32054,
- "": 32053,
- "": 32052,
- "": 32051,
- "": 32050,
- "": 32095,
- "": 32049,
- "": 32048,
- "": 32047,
- "": 32046,
- "": 32045,
- "": 32044,
- "": 32043,
- "": 32042,
- "": 32041,
- "": 32040,
- "": 32094,
- "": 32039,
- "": 32038,
- "": 32037,
- "": 32036,
- "": 32035,
- "": 32034,
- "": 32033,
- "": 32032,
- "": 32031,
- "": 32030,
- "": 32093,
- "": 32029,
- "": 32028,
- "": 32027,
- "": 32026,
- "": 32025,
- "": 32024,
- "": 32023,
- "": 32022,
- "": 32021,
- "": 32020,
- "": 32092,
- "": 32019,
- "": 32018,
- "": 32017,
- "": 32016,
- "": 32015,
- "": 32014,
- "": 32013,
- "": 32012,
- "": 32011,
- "": 32010,
- "": 32091,
- "": 32009,
- "": 32008,
- "": 32007,
- "": 32006,
- "": 32005,
- "": 32004,
- "": 32003,
- "": 32002,
- "": 32001,
- "": 32000,
- "": 32090,
- "": 32100
-}
diff --git a/bigscience_T0_3B_ssml/checkpoint-12/added_tokens.json b/bigscience_T0_3B_ssml/checkpoint-12/added_tokens.json
deleted file mode 100644
index de0c64dcab6084255716c769d34fb186045bd8c4..0000000000000000000000000000000000000000
--- a/bigscience_T0_3B_ssml/checkpoint-12/added_tokens.json
+++ /dev/null
@@ -1,105 +0,0 @@
-{
- "": 32101,
- "": 32102,
- "": 32099,
- "": 32089,
- "": 32088,
- "": 32087,
- "": 32086,
- "": 32085,
- "": 32084,
- "": 32083,
- "": 32082,
- "": 32081,
- "": 32080,
- "": 32098,
- "": 32079,
- "": 32078,
- "": 32077,
- "": 32076,
- "": 32075,
- "": 32074,
- "": 32073,
- "": 32072,
- "": 32071,
- "": 32070,
- "": 32097,
- "": 32069,
- "": 32068,
- "": 32067,
- "": 32066,
- "": 32065,
- "": 32064,
- "": 32063,
- "": 32062,
- "": 32061,
- "": 32060,
- "": 32096,
- "": 32059,
- "": 32058,
- "": 32057,
- "": 32056,
- "": 32055,
- "": 32054,
- "": 32053,
- "": 32052,
- "": 32051,
- "": 32050,
- "": 32095,
- "": 32049,
- "": 32048,
- "": 32047,
- "": 32046,
- "": 32045,
- "": 32044,
- "": 32043,
- "": 32042,
- "": 32041,
- "": 32040,
- "": 32094,
- "": 32039,
- "": 32038,
- "": 32037,
- "": 32036,
- "": 32035,
- "": 32034,
- "": 32033,
- "": 32032,
- "": 32031,
- "": 32030,
- "": 32093,
- "": 32029,
- "": 32028,
- "": 32027,
- "": 32026,
- "": 32025,
- "": 32024,
- "": 32023,
- "": 32022,
- "": 32021,
- "": 32020,
- "": 32092,
- "": 32019,
- "": 32018,
- "": 32017,
- "": 32016,
- "": 32015,
- "": 32014,
- "": 32013,
- "": 32012,
- "": 32011,
- "": 32010,
- "": 32091,
- "": 32009,
- "": 32008,
- "": 32007,
- "": 32006,
- "": 32005,
- "": 32004,
- "": 32003,
- "": 32002,
- "": 32001,
- "": 32000,
- "": 32090,
- "": 32100
-}
diff --git a/bigscience_T0_3B_ssml/checkpoint-12/config.json b/bigscience_T0_3B_ssml/checkpoint-12/config.json
deleted file mode 100644
index 7755feea8fbd37ad95d418ecca1a02b297f6f9f3..0000000000000000000000000000000000000000
--- a/bigscience_T0_3B_ssml/checkpoint-12/config.json
+++ /dev/null
@@ -1,32 +0,0 @@
-{
- "architectures": [
- "T5ForConditionalGeneration"
- ],
- "classifier_dropout": 0.0,
- "d_ff": 5120,
- "d_kv": 64,
- "d_model": 2048,
- "decoder_start_token_id": 0,
- "dense_act_fn": "gelu_new",
- "dropout_rate": 0.1,
- "eos_token_id": 1,
- "feed_forward_proj": "gated-gelu",
- "gradient_checkpointing": false,
- "initializer_factor": 1.0,
- "is_encoder_decoder": true,
- "is_gated_act": true,
- "layer_norm_epsilon": 1e-06,
- "model_type": "t5",
- "num_decoder_layers": 24,
- "num_heads": 32,
- "num_layers": 24,
- "output_past": true,
- "pad_token_id": 0,
- "relative_attention_max_distance": 128,
- "relative_attention_num_buckets": 32,
- "tie_word_embeddings": false,
- "torch_dtype": "float32",
- "transformers_version": "4.52.2",
- "use_cache": true,
- "vocab_size": 32103
-}
diff --git a/bigscience_T0_3B_ssml/checkpoint-12/generation_config.json b/bigscience_T0_3B_ssml/checkpoint-12/generation_config.json
deleted file mode 100644
index e5bd0d092c1d352ad7d3dcf48fd7bd932f517767..0000000000000000000000000000000000000000
--- a/bigscience_T0_3B_ssml/checkpoint-12/generation_config.json
+++ /dev/null
@@ -1,7 +0,0 @@
-{
- "_from_model_config": true,
- "decoder_start_token_id": 0,
- "eos_token_id": 1,
- "pad_token_id": 0,
- "transformers_version": "4.52.2"
-}
diff --git a/bigscience_T0_3B_ssml/checkpoint-12/model.safetensors.index.json b/bigscience_T0_3B_ssml/checkpoint-12/model.safetensors.index.json
deleted file mode 100644
index e61bd22dcc964d5c50ef625d67ce259358be4040..0000000000000000000000000000000000000000
--- a/bigscience_T0_3B_ssml/checkpoint-12/model.safetensors.index.json
+++ /dev/null
@@ -1,565 +0,0 @@
-{
- "metadata": {
- "total_size": 11398619136
- },
- "weight_map": {
- "decoder.block.0.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "decoder.block.0.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "decoder.block.0.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00001-of-00003.safetensors",
- "decoder.block.0.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "decoder.block.0.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "decoder.block.0.layer.1.EncDecAttention.k.weight": "model-00001-of-00003.safetensors",
- "decoder.block.0.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.0.layer.1.EncDecAttention.q.weight": "model-00001-of-00003.safetensors",
- "decoder.block.0.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.0.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.0.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.0.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.0.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.0.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.1.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.10.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.11.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.12.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.13.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.14.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.15.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.16.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.17.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.18.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.19.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.19.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.19.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.19.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.19.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.19.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.19.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.19.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.19.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.19.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.19.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
- "decoder.block.19.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
- "decoder.block.19.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
- "decoder.block.19.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.2.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.2.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.20.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
- "decoder.block.20.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
- "decoder.block.21.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
- "decoder.block.22.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.1.EncDecAttention.k.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.1.EncDecAttention.o.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.1.EncDecAttention.q.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.1.EncDecAttention.v.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.2.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
- "decoder.block.23.layer.2.layer_norm.weight": "model-00003-of-00003.safetensors",
- "decoder.block.3.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.3.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.4.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.5.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.6.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.7.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.8.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.1.EncDecAttention.k.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.1.EncDecAttention.o.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.1.EncDecAttention.q.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.1.EncDecAttention.v.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.2.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
- "decoder.block.9.layer.2.layer_norm.weight": "model-00002-of-00003.safetensors",
- "decoder.final_layer_norm.weight": "model-00003-of-00003.safetensors",
- "encoder.block.0.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.0.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.0.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00001-of-00003.safetensors",
- "encoder.block.0.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.0.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.0.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.0.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.0.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.0.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.1.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.1.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.1.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.1.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.1.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.1.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.1.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.1.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.1.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.10.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.10.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.10.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.10.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.10.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.10.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.10.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.10.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.10.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.11.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.11.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.11.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.11.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.11.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.11.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.11.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.11.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.11.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.12.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.12.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.12.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.12.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.12.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.12.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.12.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.12.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.12.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.13.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.13.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.13.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.13.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.13.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.13.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.13.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.13.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.13.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.14.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.14.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.14.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.14.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.14.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.14.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.14.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.14.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.14.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.15.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.15.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.15.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.15.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.15.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.15.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.15.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.15.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.15.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.16.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.16.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.16.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.16.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.16.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.16.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.16.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.16.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.16.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.17.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.17.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.17.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.17.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.17.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.17.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.17.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.17.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.17.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.18.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.18.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.18.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.18.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.18.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.18.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.18.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.18.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.18.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.19.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.19.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.19.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.19.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.19.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.19.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.19.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.19.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.19.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.2.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.2.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.2.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.2.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.2.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.2.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.2.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.2.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.2.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.20.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.20.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.20.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.20.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.20.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.20.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.20.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.20.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.20.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.21.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.21.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.21.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.21.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.21.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.21.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.21.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.21.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.21.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.22.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.22.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.22.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.22.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.22.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.22.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.22.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.22.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.22.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.23.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.23.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.23.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.23.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.23.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.23.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.23.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.23.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.23.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.3.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.3.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.3.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.3.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.3.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.3.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.3.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.3.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.3.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.4.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.4.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.4.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.4.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.4.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.4.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.4.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.4.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.4.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.5.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.5.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.5.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.5.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.5.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.5.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.5.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.5.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.5.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.6.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.6.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.6.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.6.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.6.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.6.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.6.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.6.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.6.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.7.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.7.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.7.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.7.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.7.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.7.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.7.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.7.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.7.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.8.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.8.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.8.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.8.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.8.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.8.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.8.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.8.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.8.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.9.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
- "encoder.block.9.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
- "encoder.block.9.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
- "encoder.block.9.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
- "encoder.block.9.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.block.9.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
- "encoder.block.9.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
- "encoder.block.9.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
- "encoder.block.9.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
- "encoder.final_layer_norm.weight": "model-00001-of-00003.safetensors",
- "lm_head.weight": "model-00003-of-00003.safetensors",
- "shared.weight": "model-00001-of-00003.safetensors"
- }
-}
diff --git a/bigscience_T0_3B_ssml/checkpoint-12/special_tokens_map.json b/bigscience_T0_3B_ssml/checkpoint-12/special_tokens_map.json
deleted file mode 100644
index 17ade346a1042cbe0c1436f5bedcbd85c099d582..0000000000000000000000000000000000000000
--- a/bigscience_T0_3B_ssml/checkpoint-12/special_tokens_map.json
+++ /dev/null
@@ -1,125 +0,0 @@
-{
- "additional_special_tokens": [
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "",
- "