student-abdullah
/

Quantized_Qwen-2.5-Coding-0.5B_Fp16Q2_mixed_selective

+---
+base_model: Qwen/Qwen2.5-Coder-0.5B
+datasets: None
+language:
+- en
+license: apache-2.0
+tags:
+- text-generation-inference
+- transformers
+- torch
+- trl
+- unsloth
+- llama
+- gguf
+---
+# Uploaded model
+- **Developed by:** student-abdullah
+- **License:** apache-2.0
+- **Quantized from model:** Qwen2.5-Coder-0.5B
+- **Created on:** 14th July, 2025
+---
+# Acknowledgement
+<div style="display: flex; gap: 10px; align-items: center;">
+  <img src="https://colab.research.google.com/img/colab_favicon_256px.png" width="200"/>
+  <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ef/ChatGPT-Logo.svg/2048px-ChatGPT-Logo.svg.png" width="140"/>
+  <img src="https://compareaimodels.com/content/images/2024/08/qwen-square.svg" width="200"/>
+</div>
+---
+# Quantization Description
+This model is quantized using *selective quantization* from the Qwen2.5-Coder-0.5B base model to increase its speed while preserving the capabilities in generating relevant and accurate responses related python programming.
+The quantization method included *16-bit* quantization of the following Layers:
+- q_proj
+- v_proj
+- o_proj
+- down_proj
+- lm_head
+Rest of the remaining layers were quantized to *Q2*
+---
+# Model Description
+| Layer Name                   | Role (Short)                                          | Type           |
+| ---------------------------- | ----------------------------------------------------- | -------------- |
+| `q_proj`, `k_proj`, `v_proj` | Compute query, key, and value for attention mechanism | Attention Proj |
+| `o_proj`                     | Projects attention output back to model hidden size   | Attention Proj |
+| `down_proj`                  | Projects MLP output down to hidden size               | MLP            |
+| `gate_proj`                  | First part of Gated MLP, controls info flow           | MLP            |
+| `up_proj`                    | Expands hidden size in MLP                            | MLP            |
+| `lm_head`                    | Final linear layer for logits                         | Output Head    |
+| `embed_tokens`               | Token embedding layer                                 | Input Embed    |
+| `norm`                       | Final layernorm                                       | Normalization  |
+| `*_layernorm`                | Normalize inputs to layers                            | Normalization  |
+---
+# Model Architect
+<pre><code>Qwen2ForCausalLM(
+  (model): Qwen2Model(
+    (embed_tokens): Embedding(151936, 896, padding_idx=151665)
+    (layers): ModuleList(
+      (0-23): 24 x Qwen2DecoderLayer(
+        (self_attn): Qwen2Attention(
+          (q_proj): Linear(in_features=896, out_features=896, bias=True)
+          (k_proj): Linear(in_features=896, out_features=128, bias=True)
+          (v_proj): Linear(in_features=896, out_features=128, bias=True)
+          (o_proj): Linear(in_features=896, out_features=896, bias=False)
+          (rotary_emb): LlamaRotaryEmbedding()
+        )
+        (mlp): Qwen2MLP(
+          (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
+          (up_proj): Linear(in_features=896, out_features=4864, bias=False)
+          (down_proj): Linear(in_features=4864, out_features=896, bias=False)
+          (act_fn): SiLU()
+        )
+        (input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
+        (post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
+      )
+    )
+    (norm): Qwen2RMSNorm((896,), eps=1e-06)
+    (rotary_emb): LlamaRotaryEmbedding()
+  )
+  (lm_head): Linear(in_features=896, out_features=151936, bias=False)
+)</code></pre>
+---
+# Performance & Limitations
+- YET TO BE EXAMINED
+---
+# Model Performace Evaluation:
+- YET TO BE EVALUATED
+<p align="center">
+  <img src="" width="20%" style="display:inline-block;"/>
+  <img src="" width="35%" style="display:inline-block;"/>
+  <img src="" width="35%" style="display:inline-block;"/>
+</p>