Update README.md
Browse files
README.md
CHANGED
|
@@ -11,4 +11,47 @@ language:
|
|
| 11 |
|
| 12 |
## Model Description
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
## Model Description
|
| 13 |
|
| 14 |
+
Model merging by analyzing and selecting optimal layers based on dimensional utilization efficiency. The process follows these steps:
|
| 15 |
+
|
| 16 |
+
Layer Analysis
|
| 17 |
+
- Downloads base and fine-tuned models from Hugging Face Hub
|
| 18 |
+
- Calculates Normalized Effective Rank (NER) for each layer
|
| 19 |
+
- NER measures how effectively each layer utilizes its dimensions through entropy analysis of singular value distributions
|
| 20 |
+
|
| 21 |
+
Layer Selection
|
| 22 |
+
- Identifies common layer structures across models
|
| 23 |
+
- Ranks layers based on their NER scores
|
| 24 |
+
- Selects highest-performing layers from each model
|
| 25 |
+
- Creates a mapping of optimal layer sources
|
| 26 |
+
|
| 27 |
+
Model Composition
|
| 28 |
+
- Creates a new model starting from the base architecture
|
| 29 |
+
- Systematically replaces layers with their highest-performing counterparts
|
| 30 |
+
- Preserves model architecture while optimizing layer-wise performance
|
| 31 |
+
- Maintains compatibility with original tokenizer and configuration
|
| 32 |
+
|
| 33 |
+
Output Generation
|
| 34 |
+
- Saves the composite model with complete weights and configuration
|
| 35 |
+
- Generates detailed merge reports documenting layer sources
|
| 36 |
+
- Copies necessary tokenizer files from base model
|
| 37 |
+
|
| 38 |
+
NER measures how effectively a neural network layer utilizes its available dimensions through entropy analysis of its singular value distribution. The calculation proceeds as follows:
|
| 39 |
+
|
| 40 |
+
1. **Singular Value Decomposition**
|
| 41 |
+
- Input: Weight matrix A ∈ R^(m×n)
|
| 42 |
+
- Compute singular values σᵢ where σᵢ ≥ 0
|
| 43 |
+
- Filter values above numerical threshold (>1e-12)
|
| 44 |
+
|
| 45 |
+
2. **Distribution Normalization**
|
| 46 |
+
- Sum all singular values: S = Σσᵢ
|
| 47 |
+
- Create probability distribution: pᵢ = σᵢ/S
|
| 48 |
+
|
| 49 |
+
3. **Entropy Calculation**
|
| 50 |
+
- Compute Shannon entropy: H = -Σ(pᵢ * log₂(pᵢ))
|
| 51 |
+
- Calculate maximum possible entropy: H_max = log₂(n)
|
| 52 |
+
where n is the number of singular values
|
| 53 |
+
|
| 54 |
+
4. **Normalization**
|
| 55 |
+
- Final NER score = H/H_max
|
| 56 |
+
- Results in value between 0 and 1
|
| 57 |
+
- Higher scores indicate more uniform dimen
|