Update README.md
Browse files
README.md
CHANGED
@@ -102,6 +102,8 @@ We measured the average inference speed (tokens/s) of generating 1024 new tokens
|
|
102 |
|BF16 | 33.40 | 31.91 | 21.33|
|
103 |
|INT4 | - | 31.95 | - |
|
104 |
|
|
|
|
|
105 |
|
106 |
## 🚀 How to use the model
|
107 |
|
|
|
102 |
|BF16 | 33.40 | 31.91 | 21.33|
|
103 |
|INT4 | - | 31.95 | - |
|
104 |
|
105 |
+
The profiling runs on a single A800-SXM4-80G GPU with PyTorch 2.4.0 and CUDA 12.1.
|
106 |
+
|
107 |
|
108 |
## 🚀 How to use the model
|
109 |
|