PowerInfer
/

SparseQwen2-7B

Model card Files Files and versions

yixinsong commited on Dec 25, 2024

Commit

b83c849

·

1 Parent(s): 4f2ba0d

minor

Files changed (1) hide show

README.md +16 -15

README.md CHANGED Viewed

@@ -11,6 +11,22 @@ Qwen2-7B-ReLU is a variant of Qwen2-7B that replaces the SiLU/Swish activation f
 - Maintains comparable or even better performance with the original Qwen2-7B
 - Significantly increases activation sparsity, enabling further optimization and compression
 ## Technical Details
 The key modification in this version is the application of ReLU activation to both branches in the MLP block. The implementation modifies the original `Qwen2MLP` class as follows:
@@ -63,22 +79,7 @@ outputs = model.generate(**inputs)
 response = tokenizer.decode(outputs[0])
 ```
-## Benchmarks
-The model has been evaluated on standard benchmarks to verify its performance:
-- **MMLU**: 69.19% (5-shot)
-- **IFEval**: 73.2% (Prompt Strict-Accuracy)
-- **Livebench**:
-  - Average: 32.1%
-  - Coding: 39.8%
-  - Data Analysis: 45.3%
-  - Instruction Following: 58.1%
-  - Language: 9.0%
-  - Math: 22.0%
-  - Reasoning: 18.7%
-These results demonstrate that the ReLU modification maintains competitive performance while achieving higher sparsity compared to the original model.
 ## Citation

 - Maintains comparable or even better performance with the original Qwen2-7B
 - Significantly increases activation sparsity, enabling further optimization and compression
+## Benchmarks
+The model has been evaluated on standard benchmarks to verify its performance:
+- **MMLU**: 69.19% (5-shot)
+- **IFEval**: 73.2% (Prompt Strict-Accuracy)
+- **Livebench**:
+  - Average: 32.1%
+  - Coding: 39.8%
+  - Data Analysis: 45.3%
+  - Instruction Following: 58.1%
+  - Language: 9.0%
+  - Math: 22.0%
+  - Reasoning: 18.7%
+These results demonstrate that the ReLU modification maintains competitive performance while achieving higher sparsity compared to the original model.
 ## Technical Details
 The key modification in this version is the application of ReLU activation to both branches in the MLP block. The implementation modifies the original `Qwen2MLP` class as follows:
 response = tokenizer.decode(outputs[0])
 ```
 ## Citation