Unable to run with vLLM

by rak-r - opened 4 days ago

Discussion

rak-r

4 days ago

When I try to serve the model through vLLM, I'm getting the below Pydantic error, can you please help me resolve this:

(APIServer pid=3244) Value error, Model architectures ['LightOnOCRForConditionalGeneration'] are not supported for now.

staghado

LightOn AI org 4 days ago

Hi,
You'll need to use the vLLM nightly build, as support for LightOnOCR is not in the latest release yet.
Try the installation steps in the model card to get the latest nightly build.

willisz

3 days ago

•

edited 3 days ago

can you please give a clear installation script ?
I mean use nightly vLLM .
or a clear feasible workaround.
Thanks.

staghado

LightOn AI org 3 days ago

Hi,
These exact commands are tested and verified to work!

uv venv --python 3.12 --seed
source .venv/bin/activate

# install vllm nightly with triton-kernels
uv pip install -U vllm \
    'triton-kernels @ git+https://github.com/triton-lang/[email protected]#subdirectory=python/triton_kernels' \
    --torch-backend=auto \
    --extra-index-url https://wheels.vllm.ai/nightly \
    --prerelease=allow

# start server
vllm serve lightonai/LightOnOCR-1B-1025 \
    --limit-mm-per-prompt '{"image": 1}' \
    --async-scheduling

rak-r

3 days ago

Thanks for the clarification, the model is being recognized now. But I am still unable to serve the model successfully and I suspect the cause to be my GPU and system requirements being too low (I can see some memory related error logs).

I just want to load and run the model on a single page pdf for testing and I am using the reference Python script provided (with 1 page pdf for my use-case). What would be the minimum system/GPU requirements to run this successfully?

Thanks

staghado

LightOn AI org 3 days ago

•

edited 3 days ago

The model being 1B you should be able to run it with as little as 16GB or less, maybe you should reduce the --max-model-len in vllm serve to something like 4096.
Here is an example with Hugging Face Transformers running on Colab

rak-r

3 days ago

Thanks for your inputs, I tried using the transformer integration and it works! (yet to try the vLLM approach with your suggestions)
I have a doubt about the model capability and working: I read your blog post and since this model was designed and trained specifically to perform 'one-pass OCR', I want to clarify if my following assumptions/understanding is right?

The model will not produce any bounding box for the detected text (as it was designed not to have this 'pipeline' approach of the traditional OCRs) or any spatial understanding?
Is it possible to interact with the document through prompts? Or is that not a possibility at all since the downstream task for the model is OCR?

Thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment