RedHatAI
/

Qwen2.5-7B-Instruct-FP8-dynamic

Text Generation

qwen2_5_instruct

text-generation-inference

compressed-tensors

Model card Files Files and versions

alexmarques commited on Apr 16

Commit

a6e7ead

·

verified ·

1 Parent(s): d3edd4c

Update README.md

Files changed (1) hide show

README.md +44 -0

README.md CHANGED Viewed

@@ -70,6 +70,50 @@ print(generated_text)
 vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
 ## Evaluation

 vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
+## Creation
+<details>
+  <summary>Creation details</summary>
+  This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet below.
+  ```python
+  from transformers import AutoModelForCausalLM, AutoTokenizer
+  from llmcompressor.modifiers.quantization import QuantizationModifier
+  from llmcompressor.transformers import oneshot
+  # Load model
+  model_stub = "Qwen/Qwen2.5-7B-Instruct-FP8-dynamic"
+  model_name = model_stub.split("/")[-1]
+  tokenizer = AutoTokenizer.from_pretrained(model_stub)
+  model = AutoModelForCausalLM.from_pretrained(
+      model_stub,
+      device_map="auto",
+      torch_dtype="auto",
+  )
+  # Configure the quantization algorithm and scheme
+  recipe = QuantizationModifier(
+      targets="Linear",
+      scheme="FP8_dynamic",
+      ignore=["lm_head"],
+  )
+  # Apply quantization
+  oneshot(
+      model=model,
+      recipe=recipe,
+  )
+  # Save to disk in compressed-tensors format
+  save_path = model_name + "-FP8-dynamic"
+  model.save_pretrained(save_path)
+  tokenizer.save_pretrained(save_path)
+  print(f"Model and tokenizer saved to: {save_path}")
+  ```
+</details>
 ## Evaluation