|
--- |
|
base_model: |
|
- agentica-org/DeepCoder-14B-Preview |
|
--- |
|
vllm (pretrained=/root/autodl-tmp/DeepCoder-14B-Preview,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.732|± |0.0281| |
|
| | |strict-match | 5|exact_match|↑ |0.856|± |0.0222| |
|
|
|
vllm (pretrained=/root/autodl-tmp/DeepCoder-14B-Preview,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.766|± |0.0190| |
|
| | |strict-match | 5|exact_match|↑ |0.856|± |0.0157| |
|
|
|
vllm (pretrained=/root/autodl-tmp/DeepCoder-14B-Preview,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1 |
|
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |
|
|------------------|------:|------|------|------|---|-----:|---|-----:| |
|
|mmlu | 2|none | |acc |↑ |0.7345|± |0.0139| |
|
| - humanities | 2|none | |acc |↑ |0.7333|± |0.0283| |
|
| - other | 2|none | |acc |↑ |0.7385|± |0.0295| |
|
| - social sciences| 2|none | |acc |↑ |0.8000|± |0.0285| |
|
| - stem | 2|none | |acc |↑ |0.6912|± |0.0254| |
|
|
|
|
|
vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.768|± |0.0268| |
|
| | |strict-match | 5|exact_match|↑ |0.868|± |0.0215| |
|
|
|
vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |
|
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |
|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.764|± |0.0190| |
|
| | |strict-match | 5|exact_match|↑ |0.884|± |0.0143| |
|
|
|
vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1 |
|
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |
|
|------------------|------:|------|------|------|---|-----:|---|-----:| |
|
|mmlu | 2|none | |acc |↑ |0.7345|± |0.0139| |
|
| - humanities | 2|none | |acc |↑ |0.7179|± |0.0287| |
|
| - other | 2|none | |acc |↑ |0.7538|± |0.0287| |
|
| - social sciences| 2|none | |acc |↑ |0.8167|± |0.0275| |
|
| - stem | 2|none | |acc |↑ |0.6807|± |0.0257| |