zhoujun commited on
Commit
7e8a9a4
·
verified ·
1 Parent(s): a28a79c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -18,7 +18,7 @@ The leaderboard is evaluated with our evaluation [code](https://github.com/LLM36
18
  | **Science** | GPQA-diamond (avg@4) | 40.78 | 38.64 | 37.63 | 35.98 | 50.63 | 55.67 | 46.46 |
19
  | | SuperGPQA | 31.80 | 30.64 | 29.75 | 27.29 | 43.60 | 46.05 | 37.73 |
20
  | **Logic** | ARC-AGI (avg@4) | 3.31 | 0.75 | 0.00 | 0.50 | 7.63 | 2.31 | 5.25 |
21
- | | Zebra Puzzle (avg@4) | 39.40 | 0.07 | 1.00 | 0.62 | 45.21 | 0.54 | 1.16 |
22
  | **Simulation** | CodeI/O (avg@4) | 15.63 | 7.13 | 5.13 | 6.63 | 12.63 | 3.75 | 9.75 |
23
  | | CruxEval-I | 61.72 | 63.63 | 69.38 | 56.25 | 80.63 | 71.13 | 72.63 |
24
  | | CruxEval-O | 71.28 | 56.50 | 65.88 | 58.31 | 88.75 | 82.38 | 67.75 |
@@ -26,7 +26,7 @@ The leaderboard is evaluated with our evaluation [code](https://github.com/LLM36
26
  | | HiTab | 74.20 | 54.40 | 54.10 | 50.40 | 82.00 | 63.30 | 69.00 |
27
  | | MultiHiertt (avg@4) | 44.94 | 31.62 | 38.10 | 37.57 | 55.28 | 52.83 | 52.83 |
28
  | **Others** | IFEval | 35.81 | 39.56 | 32.72 | 36.69 | 55.45 | 38.26 | 55.27 |
29
- | | LiveBench | 18.57 | 19.76 | 12.64 | 15.20 | 34.30 | 28.78 | 28.33 |
30
  | | **Average Score** | **43.29** | **33.76** | **35.42** | **33.97** | **54.24** | **47.53** | **46.25** |
31
 
32
 
@@ -35,7 +35,7 @@ Example usage:
35
  ```python
36
  from transformers import AutoTokenizer, AutoModelForCausalLM
37
 
38
- model = "LLM360/Guru-32B"
39
  tokenizer = AutoTokenizer.from_pretrained(model)
40
  model = AutoModelForCausalLM.from_pretrained(model, device_map="auto", torch_dtype="auto")
41
 
 
18
  | **Science** | GPQA-diamond (avg@4) | 40.78 | 38.64 | 37.63 | 35.98 | 50.63 | 55.67 | 46.46 |
19
  | | SuperGPQA | 31.80 | 30.64 | 29.75 | 27.29 | 43.60 | 46.05 | 37.73 |
20
  | **Logic** | ARC-AGI (avg@4) | 3.31 | 0.75 | 0.00 | 0.50 | 7.63 | 2.31 | 5.25 |
21
+ | | Zebra Puzzle (avg@4) | 39.40 | 0.07 | 1.00 | 0.62 | 45.21 | 0.54 | 1.16 |
22
  | **Simulation** | CodeI/O (avg@4) | 15.63 | 7.13 | 5.13 | 6.63 | 12.63 | 3.75 | 9.75 |
23
  | | CruxEval-I | 61.72 | 63.63 | 69.38 | 56.25 | 80.63 | 71.13 | 72.63 |
24
  | | CruxEval-O | 71.28 | 56.50 | 65.88 | 58.31 | 88.75 | 82.38 | 67.75 |
 
26
  | | HiTab | 74.20 | 54.40 | 54.10 | 50.40 | 82.00 | 63.30 | 69.00 |
27
  | | MultiHiertt (avg@4) | 44.94 | 31.62 | 38.10 | 37.57 | 55.28 | 52.83 | 52.83 |
28
  | **Others** | IFEval | 35.81 | 39.56 | 32.72 | 36.69 | 55.45 | 38.26 | 55.27 |
29
+ | | LiveBench | 18.57 | 19.76 | 12.64 | 15.20 | 34.30 | 28.78 | 28.33 |
30
  | | **Average Score** | **43.29** | **33.76** | **35.42** | **33.97** | **54.24** | **47.53** | **46.25** |
31
 
32
 
 
35
  ```python
36
  from transformers import AutoTokenizer, AutoModelForCausalLM
37
 
38
+ model = "LLM360/Guru-7B"
39
  tokenizer = AutoTokenizer.from_pretrained(model)
40
  model = AutoModelForCausalLM.from_pretrained(model, device_map="auto", torch_dtype="auto")
41