rwmasood commited on
Commit
019c2df
·
verified ·
1 Parent(s): feede7d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -72,7 +72,16 @@ The following two different evaluations are performed.
72
  Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A **lower perplexity** score indicates better performance (i.e., the model is more confident in its predictions).
73
 
74
 
 
75
 
 
 
 
 
 
 
 
 
76
  ```python
77
  from evaluate import load
78
  import datasets
@@ -96,13 +105,6 @@ print(round(results["mean_perplexity"], 2))
96
 
97
 
98
 
99
- #### Main Results
100
-
101
- | Model | Perplexity Score |
102
- |---------------------------------------------|----------|
103
- | **Llama-3.1-8B-Instruct** | 842611366.59 |
104
- | **Llama-3.1-10B-Instruct** | 2890.31 |
105
-
106
 
107
  ### Harness Evaluation
108
 
@@ -117,7 +119,7 @@ The library used is [lm-evaluation-harness repository](https://github.com/Eleuth
117
  | **Llama-3.1-8B-Instruct** | **73** | **71.1** | **87.9** |
118
 
119
 
120
- ### Scripts to generate evalution results
121
 
122
  ```python
123
  # install from https://github.com/EleutherAI/lm-evaluation-harness
@@ -130,8 +132,6 @@ tasks_list = ["arc_challenge", "gpqa", "ifeval", "mmlu_pro", "hellaswag"] # Ben
130
  model_path='rwmasood/llama-3.1-10b-instruct'
131
  model_name_or_path = "./output/checkpoint-2800"
132
 
133
- ```
134
-
135
  # Run evaluation
136
  results = evaluator.simple_evaluate(
137
  model="hf", # Hugging Face model
 
72
  Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A **lower perplexity** score indicates better performance (i.e., the model is more confident in its predictions).
73
 
74
 
75
+ #### Main Results
76
 
77
+ | Model | Perplexity Score |
78
+ |---------------------------------------------|----------|
79
+ | **Llama-3.1-8B-Instruct** | 842611366.59 |
80
+ | **Llama-3.1-10B-Instruct** | 2890.31 |
81
+
82
+
83
+
84
+ #### Scripts to generate evalution results
85
  ```python
86
  from evaluate import load
87
  import datasets
 
105
 
106
 
107
 
 
 
 
 
 
 
 
108
 
109
  ### Harness Evaluation
110
 
 
119
  | **Llama-3.1-8B-Instruct** | **73** | **71.1** | **87.9** |
120
 
121
 
122
+ #### Scripts to generate evalution results
123
 
124
  ```python
125
  # install from https://github.com/EleutherAI/lm-evaluation-harness
 
132
  model_path='rwmasood/llama-3.1-10b-instruct'
133
  model_name_or_path = "./output/checkpoint-2800"
134
 
 
 
135
  # Run evaluation
136
  results = evaluator.simple_evaluate(
137
  model="hf", # Hugging Face model