janhq
/

Jan-v1-4B

@@ -30,17 +30,17 @@ For question-answering, Jan-v1 shows a significant performance gain from model s
 *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
-### Report Generation & Factuality
-Evaluated on a benchmark testing factual report generation from web sources, using an LLM-as-judge. The benchmark includes our proprietary `Jan Exam - Longform` and the `DeepResearchBench`.
-| Model | Average Overall Score |
-| :--- | :--- |
-| o4-mini | 7.30 |
-| **Jan-v1-4B (Ours)** | **7.17** |
-| gpt-4.1 | 6.90 |
-| Qwen3-4B-Thinking-2507 | 6.84 |
-| 4o-mini | 6.60 |
-| Jan-nano-128k | 5.63 |
 ## Quick Start
 ### Integration with Jan App

 *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
+### Chat Benchmarks
+These benchmarks evaluate the model's conversational and instructional capabilities.
+| Benchmark | JanV1 (Ours) | Qwen3-4B-Thinking-2507 | GPT-OSS-20B (High) | GPT-OSS-20B (Low) |
+| :--- | :--- | :--- | :--- | :--- |
+| EQBench | **83.61** | 82.61 | 78.35 | 78.35 |
+| CreativeWriting | **72.08** | 65.74 | 30.23 | 26.38 |
+| IFBench | **Prompt:** 0.3537<br>**Instruction:** 0.3910 | Prompt: 0.4490<br>Instruction: **0.4806** | Prompt: 0.5646<br>Instruction: 0.6000 | Prompt: 0.5034<br>Instruction: 0.5403 |
+| ArenaHardv2 | **25.3** | - | - | - |
 ## Quick Start
 ### Integration with Jan App