alexmarques commited on
Commit
068a904
·
verified ·
1 Parent(s): 06c5c33

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -1
README.md CHANGED
@@ -120,11 +120,13 @@ vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://do
120
 
121
  ## Evaluation
122
 
123
- The model was evaluated on the OpenLLM leaderboard tasks (version 1), using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [vLLM](https://docs.vllm.ai/en/stable/).
 
124
 
125
  <details>
126
  <summary>Evaluation details</summary>
127
 
 
128
  ```
129
  lm_eval \
130
  --model vllm \
@@ -134,6 +136,78 @@ The model was evaluated on the OpenLLM leaderboard tasks (version 1), using [lm-
134
  --fewshot_as_multiturn \
135
  --batch_size auto
136
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
  </details>
138
 
139
  ### Accuracy
 
120
 
121
  ## Evaluation
122
 
123
+ The model was evaluated on the OpenLLM leaderboard tasks (versions 1 and 2), using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness), and on reasoning tasks using [lighteval](https://github.com/neuralmagic/lighteval/tree/reasoning).
124
+ [vLLM](https://docs.vllm.ai/en/stable/) was used for all evaluations.
125
 
126
  <details>
127
  <summary>Evaluation details</summary>
128
 
129
+ **lm-evaluation-harness**
130
  ```
131
  lm_eval \
132
  --model vllm \
 
136
  --fewshot_as_multiturn \
137
  --batch_size auto
138
  ```
139
+
140
+ ```
141
+ lm_eval \
142
+ --model vllm \
143
+ --model_args pretrained="RedHatAI/Qwen3-0.6B-FP8-dynamic",dtype=auto,gpu_memory_utilization=0.5,max_model_len=8192,enable_chunk_prefill=True,tensor_parallel_size=2 \
144
+ --tasks mgsm \
145
+ --apply_chat_template\
146
+ --batch_size auto
147
+ ```
148
+
149
+ ```
150
+ lm_eval \
151
+ --model vllm \
152
+ --model_args pretrained="RedHatAI/Qwen3-0.6B-FP8-dynamic",dtype=auto,gpu_memory_utilization=0.5,max_model_len=16384,enable_chunk_prefill=True,tensor_parallel_size=2 \
153
+ --tasks leaderboard \
154
+ --apply_chat_template\
155
+ --fewshot_as_multiturn \
156
+ --batch_size auto
157
+ ```
158
+
159
+ **lighteval**
160
+
161
+ lighteval_model_arguments.yaml
162
+ ```yaml
163
+ model_parameters:
164
+ model_name: RedHatAI/Qwen3-0.6B-FP8-dynamic
165
+ dtype: auto
166
+ gpu_memory_utilization: 0.9
167
+ max_model_length: 40960
168
+ generation_parameters:
169
+ temperature: 0.6
170
+ top_k: 20
171
+ min_p: 0.0
172
+ top_p: 0.95
173
+ max_new_tokens: 32768
174
+ ```
175
+
176
+ ```
177
+ lighteval vllm \
178
+ --model_args lighteval_model_arguments.yaml \
179
+ --tasks lighteval|aime24|0|0 \
180
+ --use_chat_template = true
181
+ ```
182
+
183
+ ```
184
+ lighteval vllm \
185
+ --model_args lighteval_model_arguments.yaml \
186
+ --tasks lighteval|aime25|0|0 \
187
+ --use_chat_template = true
188
+ ```
189
+
190
+ ```
191
+ lighteval vllm \
192
+ --model_args lighteval_model_arguments.yaml \
193
+ --tasks lighteval|math_500|0|0 \
194
+ --use_chat_template = true
195
+ ```
196
+
197
+ ```
198
+ lighteval vllm \
199
+ --model_args lighteval_model_arguments.yaml \
200
+ --tasks lighteval|gpqa:diamond|0|0 \
201
+ --use_chat_template = true
202
+ ```
203
+
204
+ ```
205
+ lighteval vllm \
206
+ --model_args lighteval_model_arguments.yaml \
207
+ --tasks extended|lcb:codegeneration \
208
+ --use_chat_template = true
209
+ ```
210
+
211
  </details>
212
 
213
  ### Accuracy