Comfy and Quants for local inference?

#2
by PabloFG - opened

That would be totally awesome and would make model a lot more popular.

Alpha-VLLM org

Thanks for the suggestion. However, quantizing the model would to some extent affect our image generation quality. We’ll release a working Hugging Face Space in the next few days showcasing multiple tasks, including T2I (text-to-image) and I2T (image-to-text), and demonstrating the strong potential of the DLLM generation paradigm for interactive creation.

Great.
is there any plan for ComfyUI support?

Comfy ui when?

Thanks for the suggestion. However, quantizing the model would to some extent affect our image generation quality. We’ll release a working Hugging Face Space in the next few days showcasing multiple tasks, including T2I (text-to-image) and I2T (image-to-text), and demonstrating the strong potential of the DLLM generation paradigm for interactive creation.

Of course, quantizing always effects image generation quality. The people that want quantized weights know that. But it's the difference between being able to use it practically or not for many people. A slightly less proficient model is better than not being able to use it.

I mean, someone else will quantize it eventually so if you don't want to spend time on it, that's fine. Just mentioning that people know about the tradeoffs, and if the model is competitive, it's definitely going to happen anyway.

Thanks for the suggestion. However, quantizing the model would to some extent affect our image generation quality. We’ll release a working Hugging Face Space in the next few days showcasing multiple tasks, including T2I (text-to-image) and I2T (image-to-text), and demonstrating the strong potential of the DLLM generation paradigm for interactive creation.

you can release both quantized and full models that will be usable in comfyui, fp16, bf16, q8, q6, q4 would be great

Just in case you guys weren't aware already - Loading diffusers models is already supported in comfy. This model is 8B and is already BF16 according to the tags, so most of you won't need a quantized version. Keep in mind, with diffusers models, the total size of the files doesn't necessarily equate to the total size taken in your GPU's VRAM

So I tried my own advice by using this model with the diffusers model loader, but it returned and attributeerror. Im way too tired to figure out exactly whats wrong, but its possible that its as simple as the model not being compatible (its both an LLM and an image generator/interpreter). Might try further tomorrow. Sorry if my comment ended up wasting someone's time. The rest of what I said still counts though

yeah tried Diffusers and it didn't work. it wasn't going to be that simple.. :(

Alpha-VLLM org

Hi, thanks for trying it! Before it’s merged into the main Diffusers repo, you can run it with our fork.
Firstly, install diffusers by

git clone https://github.com/qianyu-dlut/diffusers
cd diffusers
pip install -e .

Then, load the pipeline by

import torch
from diffusers import VQModel, DiffusionPipeline
from transformers import AutoTokenizer

device = "cuda"

vqvae = VQModel.from_pretrained(
    "Alpha-VLLM/Lumina-DiMOO",
    subfolder="vqvae"
).to(device=device, dtype=torch.bfloat16)

tokenizer = AutoTokenizer.from_pretrained(
    "Alpha-VLLM/Lumina-DiMOO",
    trust_remote_code=True
)

pipe = DiffusionPipeline.from_pretrained(
    "Alpha-VLLM/Lumina-DiMOO",
    vqvae=vqvae,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    # use local custom pipeline until it’s merged upstream
    custom_pipeline="path/to/diffusers/examples/community/lumina_dimoo.py",
).to(device)

More usage examples here.

Sign up or log in to comment