Qwen3.5-27B-Claude-4.6-Opus-Distilled-32k
# Model Introduction
Qwen3.5-27B-Claude-4.6-Opus-Distilled-32k is a highly capable reasoning and coding model fine-tuned on top of the Qwen3.5-27B hybrid dense architecture. The model's core directive is to leverage state-of-the-art Chain-of-Thought (CoT) distillation primarily sourced from Claude-4.6 Opus interactions, with a specialized focus on extended output generation and improved Luau programming capability.
Through Supervised Fine-Tuning (SFT) focusing on structured reasoning logic and a massive 32k output length max, this model excels in breaking down complex user problems, planning step-by-step methodologies within strictly formatted <think> tags, and delivering comprehensive, nuanced solutions—even for highly extensive generation tasks.
# Benchmark
| Benchmark | Baseline (27B) | Distilled (27B) | Jackrong (27B) |
|---|---|---|---|
| GPQA Diamond CoT (0-shot) | 55.05 | 69.69 | 67.67 |
| ARC-Challenge (25-shot) | 73.80 | 74.65 | 74.39 |
| AIME 2026 (0-shot) | 0.0 | 16.66 | 33.33 |
| AIME 2025 (0-shot) | 0.0 | 23.33 | 26.66 |
| MMLU-CF (0-shot) | 72.30 | 73.75 | - |
| Humanities | 78.28 | 79.80 | - |
| Social Sciences | 70.83 | 71.91 | - |
| STEM | 65.74 | 67.25 | - |
| Other | 74.35 | 76.02 | - |
| IFEval (0-shot) | 38.13 | 38.13 | 38.81 |
| Prompt-Level | 31.05 | 31.05 | 31.7 |
| Instruction-Level | 45.20 | 45.20 | 45.92 |
The benchmark is taken in 4-bit using lm eval. No chat-template enabled in this run. Higher the score is better.
# Training Pipeline Overview
Base Model (Qwen3.5-27B-FP8)
│
▼
Supervised Fine-Tuning (SFT) + LoRA (r=64, α=128)
(Response-Only Training masked on "<|im_start|>assistant\n")
(Max 32k Output Length)
+
nohurry/Opus-4.6-Reasoning-3000x-filtered + luau coding samples
(shuffled)
│
▼
Final Model (Qwen3.5-27B-Claude-4.6-Opus-Distilled-32k)
# Supervised Fine-Tuning (SFT) Details
- Objective: To inject high-density reasoning logic, establish a strict internal thinking format prior to output, and train the model to sustain coherent generation over exceptionally long contexts.
- Extended Output Capacity: Trained specifically to handle up to 32,768 (32k) tokens of maximum output (recommended), allowing for massive codebases, comprehensive essays, and deeply detailed reasoning traces.
- LoRA Configuration: Fine-tuned efficiently using LoRA (16-bit) with Rank (r) set to 64 and Alpha (α) set to 128, ensuring strong adaptation and retention of complex Opus-level logic.
- Method: Utilized Unsloth for highly efficient memory and compute optimization. A critical component was the
train_on_responses_onlystrategy, masking instructions so the loss is purely calculated over the generation of the<think>sequences and the subsequent solutions. - Format Enforcement: All training samples were systematically normalized so the model strictly abides by the structure
<think> {internal reasoning} </think>\n {final answer}.
# Datasets Used
The dataset consists of highly curated, filtered reasoning distillation data, supplemented by specialized coding sets:
| Dataset Name | Description / Purpose |
|---|---|
| nohurry/Opus-4.6-Reasoning-3000x-filtered | Provides comprehensive, high-quality Claude 4.6 Opus reasoning trajectories. |
| Custom Luau Coding Set | 75 meticulously crafted various Luau coding samples generated natively by Opus 4.6, injecting specialized high-quality domain knowledge for Roblox/Luau scripting capability. |
# Training Compute & Loss Curve
- Hardware: 1x NVIDIA H100 (80GB)
- Training Duration: ~50 Minutes
- Estimated Total Cost: $12.00
- Distillation Efficacy: The loss curve demonstrated a strong, healthy downward trajectory throughout the run, confirming successful knowledge transfer from the Opus teacher model. The model converged steadily from an initial loss of 0.588880 down to a final loss of 0.176861.
# Core Skills & Capabilities
- Massive Output Generation: Capable of sustaining coherent, high-quality output for up to 32k tokens, making it ideal for writing extensive code, documentation, or deep analytical reports in a single shot.
- Modular & Structured Thinking: Inheriting traits from Opus-level reasoning, the model confidently parses prompts and outlines plans sequentially in its
<think>block, avoiding exploratory "trial-and-error" self-doubt. - Luau Proficiency: Thanks to the targeted 75-sample dataset, the model exhibits improved syntax adherence and logic formulation for the Luau programming language.
# Limitations & Intended Use
- Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM. Extended 32k outputs may experience minor drift or hallucinate external facts if relying on real-world verification without grounding.
- Intended Scenario: Best suited for offline analytical tasks, heavy coding (especially Luau), math, and logic-dependent prompting where the user needs transparent internal logic and extremely long, continuous outputs.
# Acknowledgements
This model's development was made possible by the foundational tools and contributions from the broader AI ecosystem:
- Unsloth AI: For their state-of-the-art framework, enabling highly efficient, memory-optimized LoRA tuning and seamless 32k context scaling.
- Qwen Team: For engineering the robust and highly capable
Qwen3.5-27Bdense base architecture. - Dataset Contributors: Special recognition to
nohurryfor the rigorous curation of the Claude 4.6 Opus reasoning trajectories, which serves as the core cognitive engine for this project's SFT phase.
- Downloads last month
- 293
Model tree for khtsly/Qwen3.5-27B-Claude-4.6-Opus-Distilled-32k-4bit-bnb
Base model
Qwen/Qwen3.5-27B