Request for Unquantized Model Weights or Separate LoRA Adapter

#5
by christianweyer - opened

Hey @dousery !

The current model weights are distributed with BitsAndBytes quantization (load_in_4bit=True with NF4 quantization), which makes them incompatible with conversion tools like llama.cpp's convert_hf_to_gguf.py.

Technical Details:

  • When attempting conversion, the script fails with: KeyError: ('bnb', '4bit')
  • The error occurs in gguf-py/gguf/lazy.py when encountering BitsAndBytes quantized tensors
  • llama.cpp's conversion pipeline doesn't support dequantizing BitsAndBytes format

Request: Could you please provide one of the following:

  1. Unquantized weights (FP16/BF16) of the fine-tuned model
  2. Separate LoRA adapter weights that can be applied to the base openai/gpt-oss-20b model

Either option would enable conversion to other formats (GGUF, etc.) for broader inference framework compatibility.

Use Case: Deploying this medical reasoning model with llama.cpp for efficient CPU/GPU inference.

Additional Context: The model card indicates this was LoRA fine-tuned with rank 8 and 3.98M trainable parameters. If the LoRA adapter is available separately, that would be the most space-efficient option.

Thanks!

Hi,

I’m unable to provide either of the requested options. The fine-tuned model was trained and saved in 4-bit quantized format due to hardware constraints, and the LoRA adapters were merged into the full model. Therefore, separate FP16/BF16 weights or standalone LoRA adapters are not available.

Thanks for understanding.

christianweyer changed discussion status to closed
This comment has been hidden (marked as Off-Topic)

Hey @dousery !

The current model weights are distributed with BitsAndBytes quantization (load_in_4bit=True with NF4 quantization), which makes them incompatible with conversion tools like llama.cpp's convert_hf_to_gguf.py.

Technical Details:

  • When attempting conversion, the script fails with: KeyError: ('bnb', '4bit')
  • The error occurs in gguf-py/gguf/lazy.py when encountering BitsAndBytes quantized tensors
  • llama.cpp's conversion pipeline doesn't support dequantizing BitsAndBytes format

Request: Could you please provide one of the following:

  1. Unquantized weights (FP16/BF16) of the fine-tuned model
  2. Separate LoRA adapter weights that can be applied to the base openai/gpt-oss-20b model

Either option would enable conversion to other formats (GGUF, etc.) for broader inference framework compatibility.

Use Case: Deploying this medical reasoning model with llama.cpp for efficient CPU/GPU inference.

Additional Context: The model card indicates this was LoRA fine-tuned with rank 8 and 3.98M trainable parameters. If the LoRA adapter is available separately, that would be the most space-efficient option.

Thanks!

Hey! I updated my model. You can access the LoRA adapter now.

Sign up or log in to comment