Any code demonstration to run the GGUF model?

by tcapitss24 - opened 12 days ago

12 days ago

Would you provide demo to run the model on llama.cpp or vllm?
I thought I can serve it via Ollama. However, Ollama seems doesnt' not suuport it yet.

noctrex

Owner 12 days ago

Yes, ollama does not support it. Yes you can use in the other ones you mentioned.
For example with llamacpp:
llama-server --port 9090 --n-gpu-layers 99 --ctx-size 65536 --model Chandra-OCR-Q8_0.gguf --mproj mmproj-F32.gguf
And you can connect to it from you programs, through the endpoint http://localhost:9090/v1
It also runs a small webserver to test it, to try it out with http://localhost:9090

AkinyemiAra

6 days ago

Yes, ollama does not support it. Yes you can use in the other ones you mentioned.
For example with llamacpp:
llama-server --port 9090 --n-gpu-layers 99 --ctx-size 65536 --model Chandra-OCR-Q8_0.gguf --mproj mmproj-F32.gguf
And you can connect to it from you programs, through the endpoint http://localhost:9090/v1
It also runs a small webserver to test it, to try it out with http://localhost:9090

Thanks for this.
This worked for me.
llama-server --port 9090 --n-gpu-layers 99 --ctx-size 65536 --model Chandra-OCR-Q8_0.gguf --mmproj mmproj-F32.gguf

jingming32

1 day ago

This comment has been hidden (marked as Off-Topic)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment