Request for Unquantized Model Weights or Separate LoRA Adapter

by christianweyer - opened Oct 15, 2025

Oct 15, 2025

The current model weights are distributed with BitsAndBytes quantization (load_in_4bit=True with NF4 quantization), which makes them incompatible with conversion tools like llama.cpp's convert_hf_to_gguf.py.

Technical Details:

When attempting conversion, the script fails with: KeyError: ('bnb', '4bit')
The error occurs in gguf-py/gguf/lazy.py when encountering BitsAndBytes quantized tensors
llama.cpp's conversion pipeline doesn't support dequantizing BitsAndBytes format

Request: Could you please provide one of the following:

Unquantized weights (FP16/BF16) of the fine-tuned model
Separate LoRA adapter weights that can be applied to the base openai/gpt-oss-20b model

Either option would enable conversion to other formats (GGUF, etc.) for broader inference framework compatibility.

Use Case: Deploying this medical reasoning model with llama.cpp for efficient CPU/GPU inference.

Additional Context: The model card indicates this was LoRA fine-tuned with rank 8 and 3.98M trainable parameters. If the LoRA adapter is available separately, that would be the most space-efficient option.

Thanks!

dousery

Owner Nov 3, 2025

Hi,

I’m unable to provide either of the requested options. The fine-tuned model was trained and saved in 4-bit quantized format due to hardware constraints, and the LoRA adapters were merged into the full model. Therefore, separate FP16/BF16 weights or standalone LoRA adapters are not available.

Thanks for understanding.

christianweyer changed discussion status to closed Nov 3, 2025

dousery

Owner Nov 14, 2025

This comment has been hidden (marked as Off-Topic)

dousery

Owner Nov 14, 2025

Hey @dousery !

The current model weights are distributed with BitsAndBytes quantization (load_in_4bit=True with NF4 quantization), which makes them incompatible with conversion tools like llama.cpp's convert_hf_to_gguf.py.

Technical Details:

When attempting conversion, the script fails with: KeyError: ('bnb', '4bit')

The error occurs in gguf-py/gguf/lazy.py when encountering BitsAndBytes quantized tensors

llama.cpp's conversion pipeline doesn't support dequantizing BitsAndBytes format

Request: Could you please provide one of the following:

Unquantized weights (FP16/BF16) of the fine-tuned model

Separate LoRA adapter weights that can be applied to the base openai/gpt-oss-20b model

Either option would enable conversion to other formats (GGUF, etc.) for broader inference framework compatibility.

Use Case: Deploying this medical reasoning model with llama.cpp for efficient CPU/GPU inference.

Additional Context: The model card indicates this was LoRA fine-tuned with rank 8 and 3.98M trainable parameters. If the LoRA adapter is available separately, that would be the most space-efficient option.

Thanks!

Hey! I updated my model. You can access the LoRA adapter now.

christianweyer

Nov 15, 2025

Very cool - thx @dousery !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment