What is your precise workflow?

#1
by Ewere - opened

I am trying to create a GGUF from an abliterated version of the base model. I have the same commit that you used to create this GGUF but I get an assertion error when I go to quantize.

My flow is
1 - python convert_hf_to_gguf.py --outfile out.gguf ~/ssd/GLM-Steam-106B-A12B-v1-Abliterated
2 - ./build/bin/llama-quantize out.gguf out.IQ4_XS.gguf iq4_xs

Note for step 2 I was using ik_llama.cpp but Q4_K_M in llama.cpp also yields an assertion issue.

Would you mind sharing what your workflow is here? Or pointing me to where you have it detailed?

Ewere changed discussion status to closed
Ewere changed discussion status to open

This is the assertion error:

GGML_ASSERT((qs.n_attention_wv == n_attn_layer - pruned_attention_w) && "n_attention_wv is unexpected"

For future reference, the issue was with my modified model.

Ewere changed discussion status to closed

Sign up or log in to comment