Update README.md
Browse files
README.md
CHANGED
@@ -120,11 +120,13 @@ vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://do
|
|
120 |
|
121 |
## Evaluation
|
122 |
|
123 |
-
The model was evaluated on the OpenLLM leaderboard tasks (
|
|
|
124 |
|
125 |
<details>
|
126 |
<summary>Evaluation details</summary>
|
127 |
|
|
|
128 |
```
|
129 |
lm_eval \
|
130 |
--model vllm \
|
@@ -134,6 +136,78 @@ The model was evaluated on the OpenLLM leaderboard tasks (version 1), using [lm-
|
|
134 |
--fewshot_as_multiturn \
|
135 |
--batch_size auto
|
136 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
137 |
</details>
|
138 |
|
139 |
### Accuracy
|
|
|
120 |
|
121 |
## Evaluation
|
122 |
|
123 |
+
The model was evaluated on the OpenLLM leaderboard tasks (versions 1 and 2), using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness), and on reasoning tasks using [lighteval](https://github.com/neuralmagic/lighteval/tree/reasoning).
|
124 |
+
[vLLM](https://docs.vllm.ai/en/stable/) was used for all evaluations.
|
125 |
|
126 |
<details>
|
127 |
<summary>Evaluation details</summary>
|
128 |
|
129 |
+
**lm-evaluation-harness**
|
130 |
```
|
131 |
lm_eval \
|
132 |
--model vllm \
|
|
|
136 |
--fewshot_as_multiturn \
|
137 |
--batch_size auto
|
138 |
```
|
139 |
+
|
140 |
+
```
|
141 |
+
lm_eval \
|
142 |
+
--model vllm \
|
143 |
+
--model_args pretrained="RedHatAI/Qwen3-0.6B-FP8-dynamic",dtype=auto,gpu_memory_utilization=0.5,max_model_len=8192,enable_chunk_prefill=True,tensor_parallel_size=2 \
|
144 |
+
--tasks mgsm \
|
145 |
+
--apply_chat_template\
|
146 |
+
--batch_size auto
|
147 |
+
```
|
148 |
+
|
149 |
+
```
|
150 |
+
lm_eval \
|
151 |
+
--model vllm \
|
152 |
+
--model_args pretrained="RedHatAI/Qwen3-0.6B-FP8-dynamic",dtype=auto,gpu_memory_utilization=0.5,max_model_len=16384,enable_chunk_prefill=True,tensor_parallel_size=2 \
|
153 |
+
--tasks leaderboard \
|
154 |
+
--apply_chat_template\
|
155 |
+
--fewshot_as_multiturn \
|
156 |
+
--batch_size auto
|
157 |
+
```
|
158 |
+
|
159 |
+
**lighteval**
|
160 |
+
|
161 |
+
lighteval_model_arguments.yaml
|
162 |
+
```yaml
|
163 |
+
model_parameters:
|
164 |
+
model_name: RedHatAI/Qwen3-0.6B-FP8-dynamic
|
165 |
+
dtype: auto
|
166 |
+
gpu_memory_utilization: 0.9
|
167 |
+
max_model_length: 40960
|
168 |
+
generation_parameters:
|
169 |
+
temperature: 0.6
|
170 |
+
top_k: 20
|
171 |
+
min_p: 0.0
|
172 |
+
top_p: 0.95
|
173 |
+
max_new_tokens: 32768
|
174 |
+
```
|
175 |
+
|
176 |
+
```
|
177 |
+
lighteval vllm \
|
178 |
+
--model_args lighteval_model_arguments.yaml \
|
179 |
+
--tasks lighteval|aime24|0|0 \
|
180 |
+
--use_chat_template = true
|
181 |
+
```
|
182 |
+
|
183 |
+
```
|
184 |
+
lighteval vllm \
|
185 |
+
--model_args lighteval_model_arguments.yaml \
|
186 |
+
--tasks lighteval|aime25|0|0 \
|
187 |
+
--use_chat_template = true
|
188 |
+
```
|
189 |
+
|
190 |
+
```
|
191 |
+
lighteval vllm \
|
192 |
+
--model_args lighteval_model_arguments.yaml \
|
193 |
+
--tasks lighteval|math_500|0|0 \
|
194 |
+
--use_chat_template = true
|
195 |
+
```
|
196 |
+
|
197 |
+
```
|
198 |
+
lighteval vllm \
|
199 |
+
--model_args lighteval_model_arguments.yaml \
|
200 |
+
--tasks lighteval|gpqa:diamond|0|0 \
|
201 |
+
--use_chat_template = true
|
202 |
+
```
|
203 |
+
|
204 |
+
```
|
205 |
+
lighteval vllm \
|
206 |
+
--model_args lighteval_model_arguments.yaml \
|
207 |
+
--tasks extended|lcb:codegeneration \
|
208 |
+
--use_chat_template = true
|
209 |
+
```
|
210 |
+
|
211 |
</details>
|
212 |
|
213 |
### Accuracy
|