jan-hq commited on
Commit
307e403
·
verified ·
1 Parent(s): e552b3e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -30,17 +30,17 @@ For question-answering, Jan-v1 shows a significant performance gain from model s
30
 
31
  *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
32
 
33
- ### Report Generation & Factuality
34
- Evaluated on a benchmark testing factual report generation from web sources, using an LLM-as-judge. The benchmark includes our proprietary `Jan Exam - Longform` and the `DeepResearchBench`.
35
-
36
- | Model | Average Overall Score |
37
- | :--- | :--- |
38
- | o4-mini | 7.30 |
39
- | **Jan-v1-4B (Ours)** | **7.17** |
40
- | gpt-4.1 | 6.90 |
41
- | Qwen3-4B-Thinking-2507 | 6.84 |
42
- | 4o-mini | 6.60 |
43
- | Jan-nano-128k | 5.63 |
44
  ## Quick Start
45
 
46
  ### Integration with Jan App
 
30
 
31
  *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
32
 
33
+ ### Chat Benchmarks
34
+
35
+ These benchmarks evaluate the model's conversational and instructional capabilities.
36
+
37
+ | Benchmark | JanV1 (Ours) | Qwen3-4B-Thinking-2507 | GPT-OSS-20B (High) | GPT-OSS-20B (Low) |
38
+ | :--- | :--- | :--- | :--- | :--- |
39
+ | EQBench | **83.61** | 82.61 | 78.35 | 78.35 |
40
+ | CreativeWriting | **72.08** | 65.74 | 30.23 | 26.38 |
41
+ | IFBench | **Prompt:** 0.3537<br>**Instruction:** 0.3910 | Prompt: 0.4490<br>Instruction: **0.4806** | Prompt: 0.5646<br>Instruction: 0.6000 | Prompt: 0.5034<br>Instruction: 0.5403 |
42
+ | ArenaHardv2 | **25.3** | - | - | - |
43
+
44
  ## Quick Start
45
 
46
  ### Integration with Jan App