This is Intel's neural-chat-7b-v3-1, converted to GGUF. No other changes were made.

Two files are avaliable here:

  • neural-chat-7b-v3-1-fp16.gguf: the original model converted to GGUF without quantization
  • neural-chat-7b-v3-1-q8_0-LOT.gguf: the original model converted to GGUF with q8_0 quantization using the --leave-output-tensor command-line option

From llama.cpp/quantize --help:

--leave-output-tensor: Will leave output.weight un(re)quantized. Increases model size but may also increase quality, especially when requantizing

The model was converted using convert.py from Georgi Gerganov's llama.cpp repo, commit #bbecf3f.

All credit belongs to Intel for fine-tuning and releasing this model. Thank you!

Downloads last month
12
GGUF
Model size
7B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support