Instructions to use nvidia/NVLM-D-72B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nvidia/NVLM-D-72B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="nvidia/NVLM-D-72B", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import NVLM_D
model = NVLM_D.from_pretrained("nvidia/NVLM-D-72B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use nvidia/NVLM-D-72B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nvidia/NVLM-D-72B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/NVLM-D-72B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/nvidia/NVLM-D-72B

SGLang

How to use nvidia/NVLM-D-72B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nvidia/NVLM-D-72B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/NVLM-D-72B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nvidia/NVLM-D-72B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/NVLM-D-72B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use nvidia/NVLM-D-72B with Docker Model Runner:
```
docker model run hf.co/nvidia/NVLM-D-72B
```

Can not host as vLLM ?

#17

by tommywu052 - opened Oct 9, 2024

Discussion

tommywu052

Oct 9, 2024

when I try to host that as API endpoint, vllm serve "nvidia/NVLM-D-72B" --trust-remote-code
it will throw the error as
Model architectures ['NVLM_D'] are not supported for now

boxin-wbx

NVIDIA org Oct 10, 2024

We currently do not support vLLM but are actively working on integrating NVLM with vLLM. Our team is committed to delivering this support as soon as possible.

Thanks,
Boxin

jeff1jeffo

Oct 11, 2024

Seems supported in this pr https://github.com/vllm-project/vllm/pull/9045, but not yet released.
Installing vllm from the latest code might work.

ersanbil

Oct 11, 2024

checking out from latest main and building docker image worked for me

tommywu052

Oct 12, 2024

Thanks guys. It work well after install the latest wheel -
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
and serve as
vllm serve nvidia/NVLM-D-72B --tensor-parallel-size 4 --enforce-eager --max-num-seqs 16 --trust-remote-code

Malini

Oct 23, 2024

•

edited Oct 23, 2024

Can you please help with the specs on which this was run? I tried run on ColabPro with A100 and it did not work.
Thanks in advance.

ValueError: The number of required GPUs exceeds the total number of available GPUs in the placement group.
Traceback (most recent call last):
File "/usr/local/bin/vllm", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/vllm/scripts.py", line 195, in main
args.dispatch_function(args)
File "/usr/local/lib/python3.10/dist-packages/vllm/scripts.py", line 41, in serve
uvloop.run(run_server(args))
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 61, in wrapper
return await main
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 552, in run_server
async with build_async_engine_client(args) as engine_client:
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 107, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start

jeff1jeffo

Oct 30, 2024

@Malini
You might need 4 * A100 to get this running or try --cpu-offload-gb

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment