Sam Heutmaker
		
	commited on
		
		
					Commit 
							
							·
						
						4ffc59c
	
1
								Parent(s):
							
							fb437aa
								
fix graphs
Browse files
    	
        README.md
    CHANGED
    
    | @@ -51,13 +51,14 @@ Performance metrics on our internal evaluation set: | |
| 51 |  | 
| 52 | 
             
            ### Benchmark Visualizations
         | 
| 53 |  | 
| 54 | 
            -
            < | 
| 55 | 
            -
              <img src="./assets/judge-score.png" alt="Average Judge Score Comparison" width=" | 
| 56 | 
            -
              <img src="./assets/rouge-1.png" alt="ROUGE-1 Score Comparison" width=" | 
| 57 | 
            -
             | 
| 58 | 
            -
             | 
| 59 | 
            -
              <img src="./assets/ | 
| 60 | 
            -
             | 
|  | |
| 61 |  | 
| 62 | 
             
            FP8 quantization showed no measurable quality degradation compared to bf16 precision.
         | 
| 63 |  | 
| @@ -75,9 +76,7 @@ GrassData/ClipTagger-12b delivers frontier-quality performance at a fraction of | |
| 75 |  | 
| 76 | 
             
            *Cost calculations based on 700 input tokens and 250 output tokens per generation.
         | 
| 77 |  | 
| 78 | 
            -
            < | 
| 79 | 
            -
              <img src="./assets/cost.png" alt="Cost Comparison Per 1 Million Generations" width="80%" />
         | 
| 80 | 
            -
            </div>
         | 
| 81 |  | 
| 82 | 
             
            ClipTagger-12b offers **15x cost savings** compared to GPT-4.1 and **17x cost savings** compared to Claude 4 Sonnet, while maintaining comparable quality metrics.
         | 
| 83 |  | 
|  | |
| 51 |  | 
| 52 | 
             
            ### Benchmark Visualizations
         | 
| 53 |  | 
| 54 | 
            +
            <p align="center">
         | 
| 55 | 
            +
              <img src="./assets/judge-score.png" alt="Average Judge Score Comparison" width="49%" />
         | 
| 56 | 
            +
              <img src="./assets/rouge-1.png" alt="ROUGE-1 Score Comparison" width="49%" />
         | 
| 57 | 
            +
            </p>
         | 
| 58 | 
            +
            <p align="center">
         | 
| 59 | 
            +
              <img src="./assets/rouge-L.png" alt="ROUGE-L Score Comparison" width="49%" />
         | 
| 60 | 
            +
              <img src="./assets/bleu.png" alt="BLEU Score Comparison" width="49%" />
         | 
| 61 | 
            +
            </p>
         | 
| 62 |  | 
| 63 | 
             
            FP8 quantization showed no measurable quality degradation compared to bf16 precision.
         | 
| 64 |  | 
|  | |
| 76 |  | 
| 77 | 
             
            *Cost calculations based on 700 input tokens and 250 output tokens per generation.
         | 
| 78 |  | 
| 79 | 
            +
            <img src="./assets/cost.png" alt="Cost Comparison Per 1 Million Generations" width="100%" />
         | 
|  | |
|  | |
| 80 |  | 
| 81 | 
             
            ClipTagger-12b offers **15x cost savings** compared to GPT-4.1 and **17x cost savings** compared to Claude 4 Sonnet, while maintaining comparable quality metrics.
         | 
| 82 |  | 
