Qwopus3.5-27B-v3-TQ3_4S

TQ3_4S is a 3.5-bit Walsh-Hadamard-transform weight format with four per-8 scales per 32-weight block.

This release is a TQ3_4S GGUF quantization of Jackrong/Qwopus3.5-27B-v3, which is itself derived from the Qwen3.5-27B family.

Quantization Source

  • HF source checkout:
    • Jackrong/Qwopus3.5-27B-v3
  • upstream family:
    • Qwen/Qwen3.5-27B
  • F16 GGUF used as the quantization source:
    • Qwopus3.5-27B-v3-f16.gguf

Quantized with:

./build/bin/llama-quantize \
  /path/to/Qwopus3.5-27B-v3-f16.gguf \
  /path/to/Qwopus3.5-27B-v3-TQ3_4S.gguf \
  TQ3_4S \
  8

Quality

Full-pass wiki.test.raw, c=2048:

  • Final PPL = 6.3433 +/- 0.03999
  • Median chunk PPL = 6.1953

Runtime Validation

Validated on clean public llama.cpp-tq3 main:

  • runtime commit: 62eb27dce
  • runtime requirement:
    • turbo-tan/llama.cpp-tq3
  • strict chat smoke:
    • prompt: Write ONLY the word ok.
    • response: ok
  • multimodal projector:
    • mmproj.gguf

Validated server profile:

./build/bin/llama-server \
  -m /path/to/Qwopus3.5-27B-v3-TQ3_4S.gguf \
  -mm /path/to/mmproj.gguf \
  -a qwopus35-27b-v3-tq3_4s \
  --host 127.0.0.1 --port 8080 \
  -ngl 99 -c 8192 -np 1 \
  -ctk q8_0 -ctv q8_0 -fa on \
  --no-warmup --jinja \
  --reasoning off --reasoning-budget 0 --reasoning-format deepseek \
  --cache-ram 0 --no-mmproj-offload

Recommended Chat Settings

For cleaner short-answer behavior on this reasoning-distilled model, the best local setting I found was:

--reasoning on --reasoning-budget 0 --temp 0.6 --top-k 20 --min-p 0 --repeat-penalty 1.0

This helps suppress visible thinking-tag spill better than --reasoning off on simple prompts.

Vision / Image Input

The repo includes mmproj.gguf for multimodal use.

If your frontend says image input is unsupported, it is usually talking to an older server process that was started without --mmproj.

Notes

  • This is a weight quantization release for the Qwopus v3 model line.
  • Running this GGUF requires the TQ3_4S runtime in:
    • turbo-tan/llama.cpp-tq3

Credits

Downloads last month
2,511
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for YTan2000/Qwopus3.5-27B-v3-TQ3_4S

Base model

Qwen/Qwen3.5-27B
Quantized
(25)
this model