janhq
/

Jan-v1-4B

@@ -23,7 +23,8 @@ Jan-v1 leverages the newly released [Qwen3-4B-thinking](https://huggingface.co/Q
 ## Evaluation
-Jan-v1's strategic scaling has resulted in a notable performance uplift. Following the established MCP benchmark methodology, Jan-v1 sets a new standard for models in its class.
 | Model | SimpleQA Accuracy |
 | :--- | :--- |
@@ -44,6 +45,17 @@ Jan-v1's strategic scaling has resulted in a notable performance uplift. Followi
 *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
 ## Quick Start
 ### Integration with Jan App

 ## Evaluation
+### Question Answering (SimpleQA)
+For question-answering, Jan-v1 shows a significant performance gain from model scaling, achieving 91.2% accuracy.
 | Model | SimpleQA Accuracy |
 | :--- | :--- |
 *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
+### Report Generation & Factuality
+Evaluated on a benchmark testing factual report generation from web sources, using an LLM-as-judge. The benchmark includes our proprietary `Jan Exam - Longform` and the `DeepResearchBench`.
+| Model | Average Overall Score |
+| :--- | :--- |
+| o4-mini | 7.30 |
+| **Jan-v1-4B (Ours)** | **7.17** |
+| gpt-4.1 | 6.90 |
+| Qwen3-4B-Thinking-2507 | 6.84 |
+| 4o-mini | 6.60 |
+| Jan-nano-128k | 5.63 |
 ## Quick Start
 ### Integration with Jan App