Comfy and Quants for local inference?

by PabloFG - opened Sep 17

Discussion

PabloFG

Sep 17

That would be totally awesome and would make model a lot more popular.

Dakerqi

Alpha-VLLM org Sep 18

Thanks for the suggestion. However, quantizing the model would to some extent affect our image generation quality. We’ll release a working Hugging Face Space in the next few days showcasing multiple tasks, including T2I (text-to-image) and I2T (image-to-text), and demonstrating the strong potential of the DLLM generation paradigm for interactive creation.

PabloFG

Sep 19

Great.
is there any plan for ComfyUI support?

Faint6005

25 days ago

Comfy ui when?

CognitiveSourceress

22 days ago

Thanks for the suggestion. However, quantizing the model would to some extent affect our image generation quality. We’ll release a working Hugging Face Space in the next few days showcasing multiple tasks, including T2I (text-to-image) and I2T (image-to-text), and demonstrating the strong potential of the DLLM generation paradigm for interactive creation.

Of course, quantizing always effects image generation quality. The people that want quantized weights know that. But it's the difference between being able to use it practically or not for many people. A slightly less proficient model is better than not being able to use it.

I mean, someone else will quantize it eventually so if you don't want to spend time on it, that's fine. Just mentioning that people know about the tradeoffs, and if the model is competitive, it's definitely going to happen anyway.

SimplCup

22 days ago

Thanks for the suggestion. However, quantizing the model would to some extent affect our image generation quality. We’ll release a working Hugging Face Space in the next few days showcasing multiple tasks, including T2I (text-to-image) and I2T (image-to-text), and demonstrating the strong potential of the DLLM generation paradigm for interactive creation.

you can release both quantized and full models that will be usable in comfyui, fp16, bf16, q8, q6, q4 would be great

LiVeen

21 days ago

Just in case you guys weren't aware already - Loading diffusers models is already supported in comfy. This model is 8B and is already BF16 according to the tags, so most of you won't need a quantized version. Keep in mind, with diffusers models, the total size of the files doesn't necessarily equate to the total size taken in your GPU's VRAM

LiVeen

20 days ago

So I tried my own advice by using this model with the diffusers model loader, but it returned and attributeerror. Im way too tired to figure out exactly whats wrong, but its possible that its as simple as the model not being compatible (its both an LLM and an image generator/interpreter). Might try further tomorrow. Sorry if my comment ended up wasting someone's time. The rest of what I said still counts though

PabloFG

19 days ago

yeah tried Diffusers and it didn't work. it wasn't going to be that simple.. :(

qianyu1217

Alpha-VLLM org 19 days ago

Hi, thanks for trying it! Before it’s merged into the main Diffusers repo, you can run it with our fork.
Firstly, install diffusers by

git clone https://github.com/qianyu-dlut/diffusers
cd diffusers
pip install -e .

Then, load the pipeline by

import torch
from diffusers import VQModel, DiffusionPipeline
from transformers import AutoTokenizer

device = "cuda"

vqvae = VQModel.from_pretrained(
    "Alpha-VLLM/Lumina-DiMOO",
    subfolder="vqvae"
).to(device=device, dtype=torch.bfloat16)

tokenizer = AutoTokenizer.from_pretrained(
    "Alpha-VLLM/Lumina-DiMOO",
    trust_remote_code=True
)

pipe = DiffusionPipeline.from_pretrained(
    "Alpha-VLLM/Lumina-DiMOO",
    vqvae=vqvae,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    # use local custom pipeline until it’s merged upstream
    custom_pipeline="path/to/diffusers/examples/community/lumina_dimoo.py",
).to(device)

More usage examples here.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment