What is your precise workflow?

by Ewere - opened Sep 21

Sep 21

•

I am trying to create a GGUF from an abliterated version of the base model. I have the same commit that you used to create this GGUF but I get an assertion error when I go to quantize.

My flow is
1 - python convert_hf_to_gguf.py --outfile out.gguf ~/ssd/GLM-Steam-106B-A12B-v1-Abliterated
2 - ./build/bin/llama-quantize out.gguf out.IQ4_XS.gguf iq4_xs

Note for step 2 I was using ik_llama.cpp but Q4_K_M in llama.cpp also yields an assertion issue.

Would you mind sharing what your workflow is here? Or pointing me to where you have it detailed?

Ewere changed discussion status to closed Sep 21

Ewere changed discussion status to open Sep 21

Ewere

Sep 21

This is the assertion error:

GGML_ASSERT((qs.n_attention_wv == n_attn_layer - pruned_attention_w) && "n_attention_wv is unexpected"

Ewere

Sep 22

For future reference, the issue was with my modified model.

Ewere changed discussion status to closed Sep 22

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment