Not-For-All-Audiences

File size: 2,077 Bytes

---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-7B
pipeline_tag: text-generation
tags:
- not-for-all-audiences
language:
- en
---

## Model Description

Model merging by analyzing and selecting optimal layers based on dimensional utilization efficiency. The process follows these steps:

Layer Analysis
- Downloads base and fine-tuned models from Hugging Face Hub
- Calculates Normalized Effective Rank (NER) for each layer
- NER measures how effectively each layer utilizes its dimensions through entropy analysis of singular value distributions

Layer Selection
- Identifies common layer structures across models
- Ranks layers based on their NER scores
- Selects highest-performing layers from each model
- Creates a mapping of optimal layer sources

Model Composition
- Creates a new model starting from the base architecture
- Systematically replaces layers with their highest-performing counterparts
- Preserves model architecture while optimizing layer-wise performance
- Maintains compatibility with original tokenizer and configuration

Output Generation
- Saves the composite model with complete weights and configuration
- Generates detailed merge reports documenting layer sources
- Copies necessary tokenizer files from base model

NER measures how effectively a neural network layer utilizes its available dimensions through entropy analysis of its singular value distribution. The calculation proceeds as follows:

1. **Singular Value Decomposition**
   - Input: Weight matrix A ∈ R^(m×n)
   - Compute singular values σᵢ where σᵢ ≥ 0
   - Filter values above numerical threshold (>1e-12)

2. **Distribution Normalization**
   - Sum all singular values: S = Σσᵢ
   - Create probability distribution: pᵢ = σᵢ/S
   
3. **Entropy Calculation**
   - Compute Shannon entropy: H = -Σ(pᵢ * log₂(pᵢ))
   - Calculate maximum possible entropy: H_max = log₂(n)
   where n is the number of singular values

4. **Normalization**
   - Final NER score = H/H_max
   - Results in value between 0 and 1
   - Higher scores indicate more uniform dimen