Run with llamacpp

#4
by AbacusGauge - opened

This is now supported by llamacpp. Good news especially for "mps" apple silicon users.

First download https://huggingface.co/ggml-org/LightOnOCR-1B-1025-GGUF

and then

llama-server \
--host 0.0.0.0 \
--port 4183 \
-m "LightOnOCR-1B-1025-Q8_0.gguf" \
--mmproj "mmproj-LightOnOCR-1B-1025-Q8_0.gguf" \
-c 8192 --n_predict 8192 --temp 0.2 \
--top-p 0.9 \
--repeat-penalty 1.0 \
--cache-type-k q8_0 \
--threads 16 \
-ub 2048 -b 2048 --jinja \
-ngl -1

can you share the steps to create mmproj-bf16.gguf please? the following does not create a mmproj file for me:

python3 "convert_hf_to_gguf.py" LightOnOCR-1B-1025 --mmproj --outfile LightOnOCR-1B-1025-bf16.gguf --outtype bf16

If you have the latest github version, that's the correct command, you actually created the mmproj as "LightOnOCR-1B-1025-bf16.gguf".

oh! Thanks.. it works now:

python3 convert_hf_to_gguf.py . --mmproj --outtype bf16

Made some more quants for it:
https://huggingface.co/noctrex/LightOnOCR-1B-1025-GGUF
https://huggingface.co/noctrex/LightOnOCR-1B-1025-i1-GGUF

I have downloaded your BF16 GGUF. It is working perfectly. Very nice results by the model. (y)

Can you please let me know how we can use this using ollama

Sign up or log in to comment