Any code demonstration to run the GGUF model?

#2
by tcapitss24 - opened

Would you provide demo to run the model on llama.cpp or vllm?
I thought I can serve it via Ollama. However, Ollama seems doesnt' not suuport it yet.

Yes, ollama does not support it. Yes you can use in the other ones you mentioned.
For example with llamacpp:
llama-server --port 9090 --n-gpu-layers 99 --ctx-size 65536 --model Chandra-OCR-Q8_0.gguf --mproj mmproj-F32.gguf
And you can connect to it from you programs, through the endpoint http://localhost:9090/v1
It also runs a small webserver to test it, to try it out with http://localhost:9090

Yes, ollama does not support it. Yes you can use in the other ones you mentioned.
For example with llamacpp:
llama-server --port 9090 --n-gpu-layers 99 --ctx-size 65536 --model Chandra-OCR-Q8_0.gguf --mproj mmproj-F32.gguf
And you can connect to it from you programs, through the endpoint http://localhost:9090/v1
It also runs a small webserver to test it, to try it out with http://localhost:9090

Thanks for this.
This worked for me.
llama-server --port 9090 --n-gpu-layers 99 --ctx-size 65536 --model Chandra-OCR-Q8_0.gguf --mmproj mmproj-F32.gguf

This comment has been hidden (marked as Off-Topic)

Sign up or log in to comment