jan-hq commited on
Commit
9935c46
·
verified ·
1 Parent(s): 6a2ba03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -23,7 +23,8 @@ Jan-v1 leverages the newly released [Qwen3-4B-thinking](https://huggingface.co/Q
23
 
24
  ## Evaluation
25
 
26
- Jan-v1's strategic scaling has resulted in a notable performance uplift. Following the established MCP benchmark methodology, Jan-v1 sets a new standard for models in its class.
 
27
 
28
  | Model | SimpleQA Accuracy |
29
  | :--- | :--- |
@@ -44,6 +45,17 @@ Jan-v1's strategic scaling has resulted in a notable performance uplift. Followi
44
 
45
  *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
46
 
 
 
 
 
 
 
 
 
 
 
 
47
  ## Quick Start
48
 
49
  ### Integration with Jan App
 
23
 
24
  ## Evaluation
25
 
26
+ ### Question Answering (SimpleQA)
27
+ For question-answering, Jan-v1 shows a significant performance gain from model scaling, achieving 91.2% accuracy.
28
 
29
  | Model | SimpleQA Accuracy |
30
  | :--- | :--- |
 
45
 
46
  *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
47
 
48
+ ### Report Generation & Factuality
49
+ Evaluated on a benchmark testing factual report generation from web sources, using an LLM-as-judge. The benchmark includes our proprietary `Jan Exam - Longform` and the `DeepResearchBench`.
50
+
51
+ | Model | Average Overall Score |
52
+ | :--- | :--- |
53
+ | o4-mini | 7.30 |
54
+ | **Jan-v1-4B (Ours)** | **7.17** |
55
+ | gpt-4.1 | 6.90 |
56
+ | Qwen3-4B-Thinking-2507 | 6.84 |
57
+ | 4o-mini | 6.60 |
58
+ | Jan-nano-128k | 5.63 |
59
  ## Quick Start
60
 
61
  ### Integration with Jan App