AmpereComputing/ernie-4.5-a3b-21b-thinking-gguf
22B
•
Updated
•
5
Ampere's quantization formats (Q4_K_4 / Q8R16) require Ampere optimized llama.cpp available here: https://hub.docker.com/r/amperecomputingai/llama.cpp