What is your precise workflow?
#1
by
Ewere
- opened
I am trying to create a GGUF from an abliterated version of the base model. I have the same commit that you used to create this GGUF but I get an assertion error when I go to quantize.
My flow is
1 - python convert_hf_to_gguf.py --outfile out.gguf ~/ssd/GLM-Steam-106B-A12B-v1-Abliterated
2 - ./build/bin/llama-quantize out.gguf out.IQ4_XS.gguf iq4_xs
Note for step 2 I was using ik_llama.cpp but Q4_K_M in llama.cpp also yields an assertion issue.
Would you mind sharing what your workflow is here? Or pointing me to where you have it detailed?
Ewere
changed discussion status to
closed
Ewere
changed discussion status to
open
This is the assertion error:
GGML_ASSERT((qs.n_attention_wv == n_attn_layer - pruned_attention_w) && "n_attention_wv is unexpected"
For future reference, the issue was with my modified model.
Ewere
changed discussion status to
closed