Any code demonstration to run the GGUF model?
Would you provide demo to run the model on llama.cpp or vllm?
I thought I can serve it via Ollama. However, Ollama seems doesnt' not suuport it yet.
Yes, ollama does not support it. Yes you can use in the other ones you mentioned.
For example with llamacpp:llama-server --port 9090 --n-gpu-layers 99 --ctx-size 65536 --model Chandra-OCR-Q8_0.gguf --mproj mmproj-F32.gguf
And you can connect to it from you programs, through the endpoint http://localhost:9090/v1
It also runs a small webserver to test it, to try it out with http://localhost:9090
Yes, ollama does not support it. Yes you can use in the other ones you mentioned.
For example with llamacpp:llama-server --port 9090 --n-gpu-layers 99 --ctx-size 65536 --model Chandra-OCR-Q8_0.gguf --mproj mmproj-F32.gguf
And you can connect to it from you programs, through the endpointhttp://localhost:9090/v1
It also runs a small webserver to test it, to try it out withhttp://localhost:9090
Thanks for this.
This worked for me.llama-server --port 9090 --n-gpu-layers 99 --ctx-size 65536 --model Chandra-OCR-Q8_0.gguf --mmproj mmproj-F32.gguf