giannisan
/

Mistral-10.7B-Instruct-v0.3-depth-upscaling

Text Generation

text-generation-inference

Model card Files Files and versions

giannisan commited on May 26, 2024

Commit

ad01868

·

verified ·

1 Parent(s): c32defb

Update README.md

Files changed (1) hide show

README.md +40 -38

README.md CHANGED Viewed

@@ -1,38 +1,40 @@
----
-base_model:
-- mistralai/Mistral-7B-Instruct-v0.3
-library_name: transformers
-tags:
-- mergekit
-- merge
----
-# mistral-7b-instruct-v0.3-depth-upscaling
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
-## Merge Details
-### Merge Method
-This model was merged using the passthrough merge method.
-### Models Merged
-The following models were included in the merge:
-* [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
-### Configuration
-The following YAML configuration was used to produce this model:
-```yaml
-slices:
-  - sources:
-    - model: mistralai/Mistral-7B-Instruct-v0.3
-      layer_range: [0, 24]
-  - sources:
-    - model: mistralai/Mistral-7B-Instruct-v0.3
-      layer_range: [8, 32]
-merge_method: passthrough
-dtype: bfloat16
-```

+---
+base_model:
+- mistralai/Mistral-7B-Instruct-v0.3
+library_name: transformers
+tags:
+- mergekit
+- merge
+---
+# mistral-7b-instruct-v0.3-depth-upscaling
+This is an attempt at depth upscaling, Based on the paper "SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling" from arXiv document [2312.15166], this model employs a depth up-scaling technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.
+It's important to note that this represents only the initial phase of the model's development. The next critical steps involve fine-tuning,
+## Merge Details
+### Merge Method
+This model was merged using the passthrough merge method.
+### Models Merged
+The following models were included in the merge:
+* [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
+### Configuration
+The following YAML configuration was used to produce this model:
+```yaml
+slices:
+  - sources:
+    - model: mistralai/Mistral-7B-Instruct-v0.3
+      layer_range: [0, 24]
+  - sources:
+    - model: mistralai/Mistral-7B-Instruct-v0.3
+      layer_range: [8, 32]
+merge_method: passthrough
+dtype: bfloat16
+```