Model Overview

Model Architecture: DeepSeek-R1-0528
- Input: Text
- Output: Text
Supported Hardware Microarchitecture: AMD MI350/MI355
ROCm: 7.0
Operating System(s): Linux
Inference Engine: SGLang/vLLM
Model Optimizer: AMD-Quark (V0.10)
- Weight quantization: Perchannel, FP8E4M3, Static
- Activation quantization: Pertoken, FP8E4M3, Dynamic
Calibration Dataset: Pile

This model was built with deepseek-ai DeepSeek-R1-0528 model by applying AMD-Quark for FP8E4M3 PTPC quantization.

Model Quantization

The model was quantized from deepseek-ai/DeepSeek-R1-0528 using AMD-Quark. The weights are quantized to FP8 and activations are quantized to FP8.

Preprocessing requirement:

Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16. You can either perform the dequantization manually using this conversion script, or use the pre-converted BFloat16 model available at unsloth/DeepSeek-R1-0528-BF16.

Quantization scripts:

# pip install amd-quark
from transformers import AutoTokenizer, AutoModelForCausalLM

from quark.torch import ModelQuantizer, export_safetensors
from quark.torch.quantization import FP8E4M3PerChannelSpec
from quark.torch.quantization.config.config import Config, QuantizationConfig

ckpt_path = "unsloth/DeepSeek-R1-0528-BF16"
exclude_layers = ["lm_head","*mlp.gate"]
output_dir = ckpt_path.rstrip("/").split("/")[-1] + "-ptpc"

# Load the original floating-point model
model = AutoModelForCausalLM.from_pretrained(ckpt_path, device_map="auto", torch_dtype="auto", trust_remote_code=True)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(ckpt_path)

# Set the quantization configuration
FP8_PER_CHANNEL_SPEC = FP8E4M3PerChannelSpec(is_dynamic=False, ch_axis=0).to_quantization_spec()
FP8_PER_TOKEN_DYNAMIC_SPEC = FP8E4M3PerChannelSpec(is_dynamic=True, ch_axis=1).to_quantization_spec()
W_FP8_PER_CHANNEL_STATIC_A_FP8_PER_TOKEN_DYNAMIC_CONFIG = QuantizationConfig(input_tensors=FP8_PER_TOKEN_DYNAMIC_SPEC, weight=FP8_PER_CHANNEL_SPEC)
quant_config = Config(global_quant_config=W_FP8_PER_CHANNEL_STATIC_A_FP8_PER_TOKEN_DYNAMIC_CONFIG, exclude=exclude_layers)

# Apply quantization
quantizer = ModelQuantizer(quant_config)
model = quantizer.quantize_model(model)

# Export quantized model
model = quantizer.freeze(model)
export_safetensors(model, output_dir)
tokenizer.save_pretrained(output_dir)

Accuracy

Benchmark	DeepSeek-R1-0528	DeepSeek-R1-0528-ptpc(this model)
GPQA-diamond	80.72	80.05

Reproduction

Docker: rocm/vllm-private:rocm7.1_ubuntu22.04_vllm0.11.2_ptpc_fp8

vllm version: 0.10.1.1+rocm710

aiter version: 0.1.6.post2.dev55+g59bd8ff2c

lighteval version: 0.12.2

MODEL_ARGS="model_name=/amd/DeepSeek-R1-0528-ptpc,dtype=auto,tensor_parallel_size=8,max_model_length=71536,max_num_batched_tokens=65536,gpu_memory_utilization=0.9,generation_parameters={max_new_tokens:65536,temperature:0.6,top_p:0.95,seed:100}"
lighteval vllm $MODEL_ARGS "lighteval|gpqa:diamond|0"

Deployment

This model can be deployed efficiently using the vLLM backends.

License

Downloads last month: 5

Safetensors

Model size

671B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amd/DeepSeek-R1-0528-ptpc

Base model

deepseek-ai/DeepSeek-R1-0528

Quantized

(44)

this model

Collection including amd/DeepSeek-R1-0528-ptpc

Quark Quantized PTPC FP8 Models

Collection

PTPC model quantized by quark • 9 items • Updated 27 days ago