Adding Evaluation Results
#13
by
						
leaderboard-pr-bot
	
							
						- opened
							
					
    	
        README.md
    CHANGED
    
    | @@ -52,3 +52,17 @@ for step in range(5): | |
| 52 | 
             
            	# pretty print last ouput tokens from bot
         | 
| 53 | 
             
            	print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
         | 
| 54 | 
             
            ```
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 52 | 
             
            	# pretty print last ouput tokens from bot
         | 
| 53 | 
             
            	print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
         | 
| 54 | 
             
            ```
         | 
| 55 | 
            +
             | 
| 56 | 
            +
            # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
         | 
| 57 | 
            +
            Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_microsoft__DialoGPT-medium)
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            | Metric                | Value                     |
         | 
| 60 | 
            +
            |-----------------------|---------------------------|
         | 
| 61 | 
            +
            | Avg.                  | 24.74   |
         | 
| 62 | 
            +
            | ARC (25-shot)         | 24.49          |
         | 
| 63 | 
            +
            | HellaSwag (10-shot)   | 26.21    |
         | 
| 64 | 
            +
            | MMLU (5-shot)         | 25.84         |
         | 
| 65 | 
            +
            | TruthfulQA (0-shot)   | 47.06   |
         | 
| 66 | 
            +
            | Winogrande (5-shot)   | 49.57   |
         | 
| 67 | 
            +
            | GSM8K (5-shot)        | 0.0        |
         | 
| 68 | 
            +
            | DROP (3-shot)         | 0.0         |
         | 
