jeffmeloy commited on
Commit
e911d56
·
verified ·
1 Parent(s): 565a12d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -1
README.md CHANGED
@@ -11,4 +11,47 @@ language:
11
 
12
  ## Model Description
13
 
14
- Composite model created using normalized effective rank.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  ## Model Description
13
 
14
+ Model merging by analyzing and selecting optimal layers based on dimensional utilization efficiency. The process follows these steps:
15
+
16
+ Layer Analysis
17
+ - Downloads base and fine-tuned models from Hugging Face Hub
18
+ - Calculates Normalized Effective Rank (NER) for each layer
19
+ - NER measures how effectively each layer utilizes its dimensions through entropy analysis of singular value distributions
20
+
21
+ Layer Selection
22
+ - Identifies common layer structures across models
23
+ - Ranks layers based on their NER scores
24
+ - Selects highest-performing layers from each model
25
+ - Creates a mapping of optimal layer sources
26
+
27
+ Model Composition
28
+ - Creates a new model starting from the base architecture
29
+ - Systematically replaces layers with their highest-performing counterparts
30
+ - Preserves model architecture while optimizing layer-wise performance
31
+ - Maintains compatibility with original tokenizer and configuration
32
+
33
+ Output Generation
34
+ - Saves the composite model with complete weights and configuration
35
+ - Generates detailed merge reports documenting layer sources
36
+ - Copies necessary tokenizer files from base model
37
+
38
+ NER measures how effectively a neural network layer utilizes its available dimensions through entropy analysis of its singular value distribution. The calculation proceeds as follows:
39
+
40
+ 1. **Singular Value Decomposition**
41
+ - Input: Weight matrix A ∈ R^(m×n)
42
+ - Compute singular values σᵢ where σᵢ ≥ 0
43
+ - Filter values above numerical threshold (>1e-12)
44
+
45
+ 2. **Distribution Normalization**
46
+ - Sum all singular values: S = Σσᵢ
47
+ - Create probability distribution: pᵢ = σᵢ/S
48
+
49
+ 3. **Entropy Calculation**
50
+ - Compute Shannon entropy: H = -Σ(pᵢ * log₂(pᵢ))
51
+ - Calculate maximum possible entropy: H_max = log₂(n)
52
+ where n is the number of singular values
53
+
54
+ 4. **Normalization**
55
+ - Final NER score = H/H_max
56
+ - Results in value between 0 and 1
57
+ - Higher scores indicate more uniform dimen