Note that the MTP layers of this model are also PTPC-quantized.

Model Overview

  • Model Architecture: DeepSeek-V3.2
    • Input: Text
    • Output: Text
  • Supported Hardware Microarchitecture: AMD MI350/MI355
  • ROCm: 7.0
  • Operating System(s): Linux
  • Inference Engine: SGLang/vLLM
  • Model Optimizer: AMD-Quark (V0.10)
    • Weight quantization: Perchannel, FP8E4M3, Static
    • Activation quantization: Pertoken, FP8E4M3, Dynamic
  • Calibration Dataset: Pile

This model was built with deepseek-ai/DeepSeek-V3.2 model by applying AMD-Quark for FP8E4M3 PTPC quantization.

Model Quantization

The model was quantized from deepseek-ai/DeepSeek-V3.2 using AMD-Quark. The weights are quantized to FP8 and activations are quantized to FP8.

Accuracy

Benchmark DeepSeek-V3.2 DeepSeek-V3.2-ptpc(this model)
gsm8k 96.00 95.75

Reproduction

Docker: rocm/vllm-private:rocm7.1_ubuntu22.04_vllm0.11.2_ptpc_fp8

vllm version: 0.11.2.dev521+gad32e3e19.rocm710

aiter version: 0.1.6.post2.dev55+g59bd8ff2c

lm_eval version: 0.4.9.2

export VLLM_USE_V1=1  
export SAFETENSORS_FAST_GPU=1
export VLLM_ROCM_USE_AITER=1
export VLLM_ROCM_USE_AITER_MOE=1
model_path="/model_path/deepseek-ai/DeepSeek-V3.2-ptpc"
vllm serve $model_path \
  --tensor-parallel-size 8 \
  --data-parallel-size 1 \
  --max-num-batched-tokens 32768 \
  --trust-remote-code \
  --no-enable-prefix-caching \
  --disable-log-requests \
  --kv-cache-dtype bfloat16 \
  --gpu_memory_utilization 0.85 \
  --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
  --block-size 1

lm_eval \
  --model local-completions \
  --tasks gsm8k \
  --model_args model=/model_path/deepseek-ai/DeepSeek-V3.2-ptpc,base_url=http://127.0.0.1:8000/v1/completions \
  --batch_size auto \
  --limit 400

Deployment

This model can be deployed efficiently using the vLLM backends.

License

Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.

Downloads last month
12
Safetensors
Model size
686B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amd/DeepSeek-V3.2-mtp-ptpc

Quantized
(8)
this model

Collection including amd/DeepSeek-V3.2-mtp-ptpc