Ed Addario PRO

eaddario

EAddario

AI & ML interests

None yet

Recent Activity

posted an update about 12 hours ago

Experimental global target bits‑per‑weight quantization of ServiceNow-AI/Apriel-1.6-15b-Thinker and zai-org/GLM-4.6V-Flash Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target. Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs. Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards https://huggingface.co/eaddario/Apriel-1.6-15b-Thinker-GGUF https://huggingface.co/eaddario/GLM-4.6V-Flash-GGUF

updated a model about 13 hours ago

eaddario/GLM-4.6V-Flash-GGUF

published a model about 17 hours ago

eaddario/GLM-4.6V-Flash-GGUF

View all activity

Organizations

Posts 14

Post

Experimental global target bits‑per‑weight quantization of ServiceNow-AI/Apriel-1.6-15b-Thinker and zai-org/GLM-4.6V-Flash

Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.

Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.

Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards

eaddario/Apriel-1.6-15b-Thinker-GGUF
eaddario/GLM-4.6V-Flash-GGUF

Post

547

Layer-wise and Pruned versions of mistralai/Devstral-Small-2505 and mistralai/Mistral-Small-3.2-24B-Instruct-2506

- Tesor-wise:
eaddario/Devstral-Small-2505-GGUF
eaddario/Mistral-Small-3.2-24B-Instruct-2506-GGUF

- Pruned:
eaddario/Devstral-Small-2505-pruned-GGUF
eaddario/Mistral-Small-3.2-24B-Instruct-2506-pruned-GGUF

Summary in the models' cards and test results in the ./scores directory. Questions/feedback is always welcomed.

View all Posts

models 22

datasets 1

eaddario/imatrix-calibration

Updated May 24 • 13.2k • 26