Update README.md
Browse files
README.md
CHANGED
|
@@ -35,6 +35,16 @@ StarChat is a series of language models that are trained to act as helpful codin
|
|
| 35 |
- **Repository:** https://github.com/huggingface/alignment-handbook
|
| 36 |
- **Demo:** https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
## Intended uses & limitations
|
| 40 |
|
|
|
|
| 35 |
- **Repository:** https://github.com/huggingface/alignment-handbook
|
| 36 |
- **Demo:** https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
|
| 37 |
|
| 38 |
+
## Performance
|
| 39 |
+
|
| 40 |
+
StarChat2 15B was trained to balance chat and programming capabilities. It achieves strong performance on chat benchmarks like [MT Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [IFEval](https://arxiv.org/abs/2311.07911), as well as the canonical HumanEval benchmark for Python code completion. The scores reported below were obtained using the [LightEval](https://github.com/huggingface/lighteval) evaluation suite (commit `988959cb905df4baa050f82b4d499d46e8b537f2`) and each prompt has been formatted with the model's corresponding chat template to simulate real-world usage. This is why some scores may differ from those reported in technical reports or on the Open LLM Leaderboard.
|
| 41 |
+
|
| 42 |
+
| Model | MT Bench | IFEval | HumanEval |
|
| 43 |
+
|-------------------------------------------------------------------------------------------------|---------:|-------:|----------:|
|
| 44 |
+
| [starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1) | 7.66 | 35.12 | 71.34 |
|
| 45 |
+
| [deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) | 4.17 | 14.23 | 80.48 |
|
| 46 |
+
| [CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf) | 6.80 | 43.44 | 50.60 |
|
| 47 |
+
|
| 48 |
|
| 49 |
## Intended uses & limitations
|
| 50 |
|