noneUsername/Dolphin-Mistral-24B-Venice-Edition-W8A8

vllm (pretrained=/root/autodl-tmp/Dolphin-Mistral-24B-Venice-Edition,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.928	±	0.0164
		strict-match	5	exact_match	↑	0.924	±	0.0168

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.922	±	0.0120
		strict-match	5	exact_match	↑	0.918	±	0.0123

vllm (pretrained=/root/autodl-tmp/Dolphin-Mistral-24B-Venice-Edition,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.9), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7918	±	0.0131
- humanities	2	none	acc	↑	0.8205	±	0.0267
- other	2	none	acc	↑	0.8103	±	0.0266
- social sciences	2	none	acc	↑	0.8611	±	0.0248
- stem	2	none	acc	↑	0.7158	±	0.0250

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7924	±	0.0033
- humanities	2	none	acc	↑	0.7296	±	0.0062
- other	2	none	acc	↑	0.8381	±	0.0063
- social sciences	2	none	acc	↑	0.8765	±	0.0058
- stem	2	none	acc	↑	0.7590	±	0.0073

vllm (pretrained=/root/autodl-tmp/90-512-2048-9999999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.928	±	0.0164
		strict-match	5	exact_match	↑	0.928	±	0.0164

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.914	±	0.0126
		strict-match	5	exact_match	↑	0.906	±	0.0131

vllm (pretrained=/root/autodl-tmp/90-512-2048-9999999,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.9), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7918	±	0.0132
- humanities	2	none	acc	↑	0.8256	±	0.0261
- other	2	none	acc	↑	0.7897	±	0.0286
- social sciences	2	none	acc	↑	0.8667	±	0.0247
- stem	2	none	acc	↑	0.7228	±	0.0251

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7894	±	0.0033
- humanities	2	none	acc	↑	0.7294	±	0.0062
- other	2	none	acc	↑	0.8307	±	0.0065
- social sciences	2	none	acc	↑	0.8759	±	0.0059
- stem	2	none	acc	↑	0.7539	±	0.0074

noneUsername
/

Dolphin-Mistral-24B-Venice-Edition-W8A8

Model tree for noneUsername/Dolphin-Mistral-24B-Venice-Edition-W8A8