Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -23,7 +23,8 @@ Jan-v1 leverages the newly released [Qwen3-4B-thinking](https://huggingface.co/Q | |
| 23 |  | 
| 24 | 
             
            ## Evaluation
         | 
| 25 |  | 
| 26 | 
            -
             | 
|  | |
| 27 |  | 
| 28 | 
             
            | Model | SimpleQA Accuracy |
         | 
| 29 | 
             
            | :--- | :--- |
         | 
| @@ -44,6 +45,17 @@ Jan-v1's strategic scaling has resulted in a notable performance uplift. Followi | |
| 44 |  | 
| 45 | 
             
            *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
         | 
| 46 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 47 | 
             
            ## Quick Start
         | 
| 48 |  | 
| 49 | 
             
            ### Integration with Jan App
         | 
|  | |
| 23 |  | 
| 24 | 
             
            ## Evaluation
         | 
| 25 |  | 
| 26 | 
            +
            ### Question Answering (SimpleQA) 
         | 
| 27 | 
            +
            For question-answering, Jan-v1 shows a significant performance gain from model scaling, achieving 91.2% accuracy.
         | 
| 28 |  | 
| 29 | 
             
            | Model | SimpleQA Accuracy |
         | 
| 30 | 
             
            | :--- | :--- |
         | 
|  | |
| 45 |  | 
| 46 | 
             
            *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
         | 
| 47 |  | 
| 48 | 
            +
            ### Report Generation & Factuality
         | 
| 49 | 
            +
            Evaluated on a benchmark testing factual report generation from web sources, using an LLM-as-judge. The benchmark includes our proprietary `Jan Exam - Longform` and the `DeepResearchBench`.
         | 
| 50 | 
            +
             | 
| 51 | 
            +
            | Model | Average Overall Score |
         | 
| 52 | 
            +
            | :--- | :--- |
         | 
| 53 | 
            +
            | o4-mini | 7.30 |
         | 
| 54 | 
            +
            | **Jan-v1-4B (Ours)** | **7.17** |
         | 
| 55 | 
            +
            | gpt-4.1 | 6.90 |
         | 
| 56 | 
            +
            | Qwen3-4B-Thinking-2507 | 6.84 |
         | 
| 57 | 
            +
            | 4o-mini | 6.60 |
         | 
| 58 | 
            +
            | Jan-nano-128k | 5.63 |
         | 
| 59 | 
             
            ## Quick Start
         | 
| 60 |  | 
| 61 | 
             
            ### Integration with Jan App
         | 

