|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- Qwen/Qwen2.5-7B |
|
pipeline_tag: text-generation |
|
tags: |
|
- not-for-all-audiences |
|
language: |
|
- en |
|
--- |
|
|
|
## Model Description |
|
|
|
Model merging by analyzing and selecting optimal layers based on dimensional utilization efficiency. The process follows these steps: |
|
|
|
Layer Analysis |
|
- Downloads base and fine-tuned models from Hugging Face Hub |
|
- Calculates Normalized Effective Rank (NER) for each layer |
|
- NER measures how effectively each layer utilizes its dimensions through entropy analysis of singular value distributions |
|
|
|
Layer Selection |
|
- Identifies common layer structures across models |
|
- Ranks layers based on their NER scores |
|
- Selects highest-performing layers from each model |
|
- Creates a mapping of optimal layer sources |
|
|
|
Model Composition |
|
- Creates a new model starting from the base architecture |
|
- Systematically replaces layers with their highest-performing counterparts |
|
- Preserves model architecture while optimizing layer-wise performance |
|
- Maintains compatibility with original tokenizer and configuration |
|
|
|
Output Generation |
|
- Saves the composite model with complete weights and configuration |
|
- Generates detailed merge reports documenting layer sources |
|
- Copies necessary tokenizer files from base model |
|
|
|
NER measures how effectively a neural network layer utilizes its available dimensions through entropy analysis of its singular value distribution. The calculation proceeds as follows: |
|
|
|
1. **Singular Value Decomposition** |
|
- Input: Weight matrix A ∈ R^(m×n) |
|
- Compute singular values σᵢ where σᵢ ≥ 0 |
|
- Filter values above numerical threshold (>1e-12) |
|
|
|
2. **Distribution Normalization** |
|
- Sum all singular values: S = Σσᵢ |
|
- Create probability distribution: pᵢ = σᵢ/S |
|
|
|
3. **Entropy Calculation** |
|
- Compute Shannon entropy: H = -Σ(pᵢ * log₂(pᵢ)) |
|
- Calculate maximum possible entropy: H_max = log₂(n) |
|
where n is the number of singular values |
|
|
|
4. **Normalization** |
|
- Final NER score = H/H_max |
|
- Results in value between 0 and 1 |
|
- Higher scores indicate more uniform dimen |