RedHatAI
/

gemma-3-1b-it-quantized.w8a8

Text Generation

text-generation-inference

8-bit precision

compressed-tensors

Model card Files Files and versions

nm-research commited on Jun 5

Commit

4ef76e4

·

verified ·

1 Parent(s): 24b86ed

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -39,7 +39,7 @@ from vllm import LLM, SamplingParams
 # prepare model
 llm = LLM(
-    model="nm-testing/gemma-3-1b-it-quantized.w8a8",
     trust_remote_code=True,
     max_model_len=4096,
     max_num_seqs=2,
@@ -183,7 +183,7 @@ lm_eval \
       <th>Category</th>
       <th>Metric</th>
       <th>google/gemma-3-1b-it</th>
-      <th>nm-testing/gemma-3-1b-it-quantized.w8a8</th>
       <th>Recovery (%)</th>
     </tr>
   </thead>

 # prepare model
 llm = LLM(
+    model="RedHatAI/gemma-3-1b-it-quantized.w8a8",
     trust_remote_code=True,
     max_model_len=4096,
     max_num_seqs=2,
       <th>Category</th>
       <th>Metric</th>
       <th>google/gemma-3-1b-it</th>
+      <th>RedHatAI/gemma-3-1b-it-quantized.w8a8</th>
       <th>Recovery (%)</th>
     </tr>
   </thead>