Update README.md
Browse files
README.md
CHANGED
@@ -11,4 +11,47 @@ language:
|
|
11 |
|
12 |
## Model Description
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
## Model Description
|
13 |
|
14 |
+
Model merging by analyzing and selecting optimal layers based on dimensional utilization efficiency. The process follows these steps:
|
15 |
+
|
16 |
+
Layer Analysis
|
17 |
+
- Downloads base and fine-tuned models from Hugging Face Hub
|
18 |
+
- Calculates Normalized Effective Rank (NER) for each layer
|
19 |
+
- NER measures how effectively each layer utilizes its dimensions through entropy analysis of singular value distributions
|
20 |
+
|
21 |
+
Layer Selection
|
22 |
+
- Identifies common layer structures across models
|
23 |
+
- Ranks layers based on their NER scores
|
24 |
+
- Selects highest-performing layers from each model
|
25 |
+
- Creates a mapping of optimal layer sources
|
26 |
+
|
27 |
+
Model Composition
|
28 |
+
- Creates a new model starting from the base architecture
|
29 |
+
- Systematically replaces layers with their highest-performing counterparts
|
30 |
+
- Preserves model architecture while optimizing layer-wise performance
|
31 |
+
- Maintains compatibility with original tokenizer and configuration
|
32 |
+
|
33 |
+
Output Generation
|
34 |
+
- Saves the composite model with complete weights and configuration
|
35 |
+
- Generates detailed merge reports documenting layer sources
|
36 |
+
- Copies necessary tokenizer files from base model
|
37 |
+
|
38 |
+
NER measures how effectively a neural network layer utilizes its available dimensions through entropy analysis of its singular value distribution. The calculation proceeds as follows:
|
39 |
+
|
40 |
+
1. **Singular Value Decomposition**
|
41 |
+
- Input: Weight matrix A ∈ R^(m×n)
|
42 |
+
- Compute singular values σᵢ where σᵢ ≥ 0
|
43 |
+
- Filter values above numerical threshold (>1e-12)
|
44 |
+
|
45 |
+
2. **Distribution Normalization**
|
46 |
+
- Sum all singular values: S = Σσᵢ
|
47 |
+
- Create probability distribution: pᵢ = σᵢ/S
|
48 |
+
|
49 |
+
3. **Entropy Calculation**
|
50 |
+
- Compute Shannon entropy: H = -Σ(pᵢ * log₂(pᵢ))
|
51 |
+
- Calculate maximum possible entropy: H_max = log₂(n)
|
52 |
+
where n is the number of singular values
|
53 |
+
|
54 |
+
4. **Normalization**
|
55 |
+
- Final NER score = H/H_max
|
56 |
+
- Results in value between 0 and 1
|
57 |
+
- Higher scores indicate more uniform dimen
|