zhiyucheng commited on
Commit
b64a154
·
verified ·
1 Parent(s): b335a0b

Reformat table

Browse files
Files changed (1) hide show
  1. README.md +27 -6
README.md CHANGED
@@ -90,12 +90,33 @@ Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com
90
 
91
  ## Evaluation
92
  The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark results are presented in the table below:
93
-
94
- | Precision | MMLU | TPS |
95
- |-----------|-------|---------|
96
- | FP16 | 68.6 | 8,579.93 |
97
- | FP8 | 68.3 | 11,062.90 |
98
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
 
100
  We benchmarked with tensorrt-llm v0.13 on 8 H100 GPUs, using batch size 1024 for the throughputs with in-flight batching enabled. We achieved **~1.3x** speedup with FP8.
101
 
 
90
 
91
  ## Evaluation
92
  The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark results are presented in the table below:
93
+ <table>
94
+ <tr>
95
+ <td><strong>Precision</strong>
96
+ </td>
97
+ <td><strong>MMLU</strong>
98
+ </td>
99
+ <td><strong>TPS</strong>
100
+ </td>
101
+ </tr>
102
+ <tr>
103
+ <td>FP16
104
+ </td>
105
+ <td>68.6
106
+ </td>
107
+ <td>8,579.93
108
+ </td>
109
+ </tr>
110
+ <tr>
111
+ <td>FP8
112
+ </td>
113
+ <td>68.3
114
+ </td>
115
+ <td>11,062.90
116
+ </td>
117
+ </tr>
118
+ <tr>
119
+ </table>
120
 
121
  We benchmarked with tensorrt-llm v0.13 on 8 H100 GPUs, using batch size 1024 for the throughputs with in-flight batching enabled. We achieved **~1.3x** speedup with FP8.
122