jeffmeloy's picture
Update README.md
245a9a0 verified
|
raw
history blame
2.08 kB
metadata
license: apache-2.0
base_model:
  - Qwen/Qwen2.5-7B
pipeline_tag: text-generation
tags:
  - not-for-all-audiences
language:
  - en

Model Description

Model created by analyzing and selecting optimal layers based on dimensional utilization efficiency. The process follows these steps:

Layer Analysis

  • Downloads base and fine-tuned models from Hugging Face Hub
  • Calculates Normalized Effective Rank (NER) for each layer
  • NER measures how effectively each layer utilizes its dimensions through entropy analysis of singular value distributions

Layer Selection

  • Identifies common layer structures across models
  • Ranks layers based on their NER scores
  • Selects highest-performing layers from each model
  • Creates a mapping of optimal layer sources

Model Composition

  • Creates a new model starting from the base architecture
  • Systematically replaces layers with their highest-performing counterparts
  • Preserves model architecture while optimizing layer-wise performance
  • Maintains compatibility with original tokenizer and configuration

Output Generation

  • Saves the composite model with complete weights and configuration
  • Generates detailed merge reports documenting layer sources
  • Copies necessary tokenizer files from base model

NER measures how effectively a neural network layer utilizes its available dimensions through entropy analysis of its singular value distribution. The calculation proceeds as follows:

  1. Singular Value Decomposition

    • Input: Weight matrix A ∈ R^(m×n)
    • Compute singular values σᵢ where σᵢ ≥ 0
    • Filter values above numerical threshold (>1e-12)
  2. Distribution Normalization

    • Sum all singular values: S = Σσᵢ
    • Create probability distribution: pᵢ = σᵢ/S
  3. Entropy Calculation

    • Compute Shannon entropy: H = -Σ(pᵢ * log₂(pᵢ))
    • Calculate maximum possible entropy: H_max = log₂(n) where n is the number of singular values
  4. Normalization

    • Final NER score = H/H_max
    • Results in value between 0 and 1
    • Higher scores indicate more uniform dimen