Update README.md
Browse files
README.md
CHANGED
@@ -30,17 +30,17 @@ For question-answering, Jan-v1 shows a significant performance gain from model s
|
|
30 |
|
31 |
*The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
|
32 |
|
33 |
-
###
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
|
38 |
-
|
|
39 |
-
|
|
40 |
-
|
|
41 |
-
|
|
42 |
-
|
|
43 |
-
|
44 |
## Quick Start
|
45 |
|
46 |
### Integration with Jan App
|
|
|
30 |
|
31 |
*The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
|
32 |
|
33 |
+
### Chat Benchmarks
|
34 |
+
|
35 |
+
These benchmarks evaluate the model's conversational and instructional capabilities.
|
36 |
+
|
37 |
+
| Benchmark | JanV1 (Ours) | Qwen3-4B-Thinking-2507 | GPT-OSS-20B (High) | GPT-OSS-20B (Low) |
|
38 |
+
| :--- | :--- | :--- | :--- | :--- |
|
39 |
+
| EQBench | **83.61** | 82.61 | 78.35 | 78.35 |
|
40 |
+
| CreativeWriting | **72.08** | 65.74 | 30.23 | 26.38 |
|
41 |
+
| IFBench | **Prompt:** 0.3537<br>**Instruction:** 0.3910 | Prompt: 0.4490<br>Instruction: **0.4806** | Prompt: 0.5646<br>Instruction: 0.6000 | Prompt: 0.5034<br>Instruction: 0.5403 |
|
42 |
+
| ArenaHardv2 | **25.3** | - | - | - |
|
43 |
+
|
44 |
## Quick Start
|
45 |
|
46 |
### Integration with Jan App
|