RedHatAI
/

Llama-3.1-8B-tldr-FP8-dynamic

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

alexmarques commited on Jun 6

Commit

bc4997d

·

verified ·

1 Parent(s): b93a144

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -39,7 +39,6 @@ Once your server is started, you can query the model using the OpenAI API:
 ```python
 from openai import OpenAI
-# Modify OpenAI's API key and API base to use vLLM's API server.
 openai_api_key = "EMPTY"
 openai_api_base = "http://localhost:8000/v1"
 client = OpenAI(
@@ -62,7 +61,7 @@ And there's more. You can run 2:4 sparse models on vLLM and get significant spee
 prompt = f"Give a TL;DR of the following Reddit post.\n<|user|>{post}\nTL;DR:\n<|assistant|>\n"
 completion = client.completions.create(
-  model="RedHatAI/Llama-3.1-8B-tldr",
   prompt=prompt,
   max_tokens=256,
 )

 ```python
 from openai import OpenAI
 openai_api_key = "EMPTY"
 openai_api_base = "http://localhost:8000/v1"
 client = OpenAI(
 prompt = f"Give a TL;DR of the following Reddit post.\n<|user|>{post}\nTL;DR:\n<|assistant|>\n"
 completion = client.completions.create(
+  model="RedHatAI/Llama-3.1-8B-tldr-FP8-dynamic",
   prompt=prompt,
   max_tokens=256,
 )