Reformat table
Browse files
README.md
CHANGED
@@ -90,12 +90,33 @@ Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com
|
|
90 |
|
91 |
## Evaluation
|
92 |
The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark results are presented in the table below:
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
|
100 |
We benchmarked with tensorrt-llm v0.13 on 8 H100 GPUs, using batch size 1024 for the throughputs with in-flight batching enabled. We achieved **~1.3x** speedup with FP8.
|
101 |
|
|
|
90 |
|
91 |
## Evaluation
|
92 |
The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark results are presented in the table below:
|
93 |
+
<table>
|
94 |
+
<tr>
|
95 |
+
<td><strong>Precision</strong>
|
96 |
+
</td>
|
97 |
+
<td><strong>MMLU</strong>
|
98 |
+
</td>
|
99 |
+
<td><strong>TPS</strong>
|
100 |
+
</td>
|
101 |
+
</tr>
|
102 |
+
<tr>
|
103 |
+
<td>FP16
|
104 |
+
</td>
|
105 |
+
<td>68.6
|
106 |
+
</td>
|
107 |
+
<td>8,579.93
|
108 |
+
</td>
|
109 |
+
</tr>
|
110 |
+
<tr>
|
111 |
+
<td>FP8
|
112 |
+
</td>
|
113 |
+
<td>68.3
|
114 |
+
</td>
|
115 |
+
<td>11,062.90
|
116 |
+
</td>
|
117 |
+
</tr>
|
118 |
+
<tr>
|
119 |
+
</table>
|
120 |
|
121 |
We benchmarked with tensorrt-llm v0.13 on 8 H100 GPUs, using batch size 1024 for the throughputs with in-flight batching enabled. We achieved **~1.3x** speedup with FP8.
|
122 |
|