alexmarques commited on
Commit
a6e7ead
·
verified ·
1 Parent(s): d3edd4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md CHANGED
@@ -70,6 +70,50 @@ print(generated_text)
70
 
71
  vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
  ## Evaluation
75
 
 
70
 
71
  vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
72
 
73
+ ## Creation
74
+
75
+ <details>
76
+ <summary>Creation details</summary>
77
+ This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet below.
78
+
79
+
80
+ ```python
81
+ from transformers import AutoModelForCausalLM, AutoTokenizer
82
+ from llmcompressor.modifiers.quantization import QuantizationModifier
83
+ from llmcompressor.transformers import oneshot
84
+
85
+ # Load model
86
+ model_stub = "Qwen/Qwen2.5-7B-Instruct-FP8-dynamic"
87
+ model_name = model_stub.split("/")[-1]
88
+
89
+ tokenizer = AutoTokenizer.from_pretrained(model_stub)
90
+
91
+ model = AutoModelForCausalLM.from_pretrained(
92
+ model_stub,
93
+ device_map="auto",
94
+ torch_dtype="auto",
95
+ )
96
+
97
+ # Configure the quantization algorithm and scheme
98
+ recipe = QuantizationModifier(
99
+ targets="Linear",
100
+ scheme="FP8_dynamic",
101
+ ignore=["lm_head"],
102
+ )
103
+
104
+ # Apply quantization
105
+ oneshot(
106
+ model=model,
107
+ recipe=recipe,
108
+ )
109
+
110
+ # Save to disk in compressed-tensors format
111
+ save_path = model_name + "-FP8-dynamic"
112
+ model.save_pretrained(save_path)
113
+ tokenizer.save_pretrained(save_path)
114
+ print(f"Model and tokenizer saved to: {save_path}")
115
+ ```
116
+ </details>
117
 
118
  ## Evaluation
119