jeffmeloy's picture
Update README.md
e911d56 verified
|
raw
history blame
2.08 kB
---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-7B
pipeline_tag: text-generation
tags:
- not-for-all-audiences
language:
- en
---
## Model Description
Model merging by analyzing and selecting optimal layers based on dimensional utilization efficiency. The process follows these steps:
Layer Analysis
- Downloads base and fine-tuned models from Hugging Face Hub
- Calculates Normalized Effective Rank (NER) for each layer
- NER measures how effectively each layer utilizes its dimensions through entropy analysis of singular value distributions
Layer Selection
- Identifies common layer structures across models
- Ranks layers based on their NER scores
- Selects highest-performing layers from each model
- Creates a mapping of optimal layer sources
Model Composition
- Creates a new model starting from the base architecture
- Systematically replaces layers with their highest-performing counterparts
- Preserves model architecture while optimizing layer-wise performance
- Maintains compatibility with original tokenizer and configuration
Output Generation
- Saves the composite model with complete weights and configuration
- Generates detailed merge reports documenting layer sources
- Copies necessary tokenizer files from base model
NER measures how effectively a neural network layer utilizes its available dimensions through entropy analysis of its singular value distribution. The calculation proceeds as follows:
1. **Singular Value Decomposition**
- Input: Weight matrix A ∈ R^(m×n)
- Compute singular values σᵢ where σᵢ ≥ 0
- Filter values above numerical threshold (>1e-12)
2. **Distribution Normalization**
- Sum all singular values: S = Σσᵢ
- Create probability distribution: pᵢ = σᵢ/S
3. **Entropy Calculation**
- Compute Shannon entropy: H = -Σ(pᵢ * log₂(pᵢ))
- Calculate maximum possible entropy: H_max = log₂(n)
where n is the number of singular values
4. **Normalization**
- Final NER score = H/H_max
- Results in value between 0 and 1
- Higher scores indicate more uniform dimen