Run with llamacpp
This is now supported by llamacpp. Good news especially for "mps" apple silicon users.
First download https://huggingface.co/ggml-org/LightOnOCR-1B-1025-GGUF
and then
llama-server \
--host 0.0.0.0 \
--port 4183 \
-m "LightOnOCR-1B-1025-Q8_0.gguf" \
--mmproj "mmproj-LightOnOCR-1B-1025-Q8_0.gguf" \
-c 8192 --n_predict 8192 --temp 0.2 \
--top-p 0.9 \
--repeat-penalty 1.0 \
--cache-type-k q8_0 \
--threads 16 \
-ub 2048 -b 2048 --jinja \
-ngl -1
Made some more quants for it:
https://huggingface.co/noctrex/LightOnOCR-1B-1025-GGUF
https://huggingface.co/noctrex/LightOnOCR-1B-1025-i1-GGUF
can you share the steps to create mmproj-bf16.gguf please? the following does not create a mmproj file for me:
python3 "convert_hf_to_gguf.py" LightOnOCR-1B-1025 --mmproj --outfile LightOnOCR-1B-1025-bf16.gguf --outtype bf16
If you have the latest github version, that's the correct command, you actually created the mmproj as "LightOnOCR-1B-1025-bf16.gguf".
oh! Thanks.. it works now:
python3 convert_hf_to_gguf.py . --mmproj --outtype bf16
Made some more quants for it:
https://huggingface.co/noctrex/LightOnOCR-1B-1025-GGUF
https://huggingface.co/noctrex/LightOnOCR-1B-1025-i1-GGUF
I have downloaded your BF16 GGUF. It is working perfectly. Very nice results by the model. (y)
Can you please let me know how we can use this using ollama