Update README.md
Browse files
README.md
CHANGED
|
@@ -120,8 +120,10 @@ model-index:
|
|
| 120 |
|
| 121 |
## Model Summary
|
| 122 |
PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a wide range of open-source and synthetic datasets with permissive licenses. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
|
|
|
|
| 123 |
|
| 124 |
## Usage
|
|
|
|
| 125 |
|
| 126 |
### Generation
|
| 127 |
This is a simple example of how to use **PowerMoE-3b** model.
|
|
|
|
| 120 |
|
| 121 |
## Model Summary
|
| 122 |
PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a wide range of open-source and synthetic datasets with permissive licenses. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
|
| 123 |
+
Paper: https://arxiv.org/abs/2408.13359
|
| 124 |
|
| 125 |
## Usage
|
| 126 |
+
Note: requires a custom branch of transformers: https://github.com/mayank31398/transformers/tree/granitemoe
|
| 127 |
|
| 128 |
### Generation
|
| 129 |
This is a simple example of how to use **PowerMoE-3b** model.
|