RedHatAI
/

Qwen3-0.6B-FP8-dynamic

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

alexmarques commited on May 12

Commit

068a904

·

verified ·

1 Parent(s): 06c5c33

Update README.md

Files changed (1) hide show

README.md +75 -1

README.md CHANGED Viewed

@@ -120,11 +120,13 @@ vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://do
 ## Evaluation
-The model was evaluated on the OpenLLM leaderboard tasks (version 1), using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [vLLM](https://docs.vllm.ai/en/stable/).
 <details>
   <summary>Evaluation details</summary>
   ```
   lm_eval \
     --model vllm \
@@ -134,6 +136,78 @@ The model was evaluated on the OpenLLM leaderboard tasks (version 1), using [lm-
     --fewshot_as_multiturn \
     --batch_size auto
   ```
 </details>
 ### Accuracy

 ## Evaluation
+The model was evaluated on the OpenLLM leaderboard tasks (versions 1 and 2), using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness), and on reasoning tasks using [lighteval](https://github.com/neuralmagic/lighteval/tree/reasoning).
+[vLLM](https://docs.vllm.ai/en/stable/) was used for all evaluations.
 <details>
   <summary>Evaluation details</summary>
+  **lm-evaluation-harness**
   ```
   lm_eval \
     --model vllm \
     --fewshot_as_multiturn \
     --batch_size auto
   ```
+  ```
+  lm_eval \
+    --model vllm \
+    --model_args pretrained="RedHatAI/Qwen3-0.6B-FP8-dynamic",dtype=auto,gpu_memory_utilization=0.5,max_model_len=8192,enable_chunk_prefill=True,tensor_parallel_size=2 \
+    --tasks mgsm \
+    --apply_chat_template\
+    --batch_size auto
+  ```
+  ```
+  lm_eval \
+    --model vllm \
+    --model_args pretrained="RedHatAI/Qwen3-0.6B-FP8-dynamic",dtype=auto,gpu_memory_utilization=0.5,max_model_len=16384,enable_chunk_prefill=True,tensor_parallel_size=2 \
+    --tasks leaderboard \
+    --apply_chat_template\
+    --fewshot_as_multiturn \
+    --batch_size auto
+  ```
+  **lighteval**
+  lighteval_model_arguments.yaml
+  ```yaml
+  model_parameters:
+    model_name: RedHatAI/Qwen3-0.6B-FP8-dynamic
+    dtype: auto
+    gpu_memory_utilization: 0.9
+    max_model_length: 40960
+    generation_parameters:
+      temperature: 0.6
+      top_k: 20
+      min_p: 0.0
+      top_p: 0.95
+      max_new_tokens: 32768
+  ```
+  ```
+  lighteval vllm \
+    --model_args lighteval_model_arguments.yaml \
+    --tasks lighteval|aime24|0|0 \
+    --use_chat_template = true
+  ```
+  ```
+  lighteval vllm \
+    --model_args lighteval_model_arguments.yaml \
+    --tasks lighteval|aime25|0|0 \
+    --use_chat_template = true
+  ```
+  ```
+  lighteval vllm \
+    --model_args lighteval_model_arguments.yaml \
+    --tasks lighteval|math_500|0|0 \
+    --use_chat_template = true
+  ```
+  ```
+  lighteval vllm \
+    --model_args lighteval_model_arguments.yaml \
+    --tasks lighteval|gpqa:diamond|0|0 \
+    --use_chat_template = true
+  ```
+  ```
+  lighteval vllm \
+    --model_args lighteval_model_arguments.yaml \
+    --tasks extended|lcb:codegeneration \
+    --use_chat_template = true
+  ```
 </details>
 ### Accuracy