Model Visualization
Qwen3-4B-Reasoning-Backfill-v0.1
Experimental reasoning-trace backfiller fine-tuned from Qwen/Qwen3-4B

Overview

This is an experimental model trained to reconstruct a plausible chain of reasoning connecting a user-provided INSTRUCTION to a fixed SOLUTION while preserving the original answer. It focuses entirely on the route, not rewriting the destination, producing stepwise “thinking” traces that align with the target output. The goal is to enable reasoning backfill for legacy or non-reasoning datasets where collecting thought processes directly is impractical, such as older chat logs or instruction corpora. These traces can help bootstrap process-supervision signals, support teacher-style models, and deepen auditability of output behavior.

I would love to try this with larger or more stylized models to see how feasible it is to produce stylized or limited-domain reasoning traces. If you’re a compute partner interested in scaling this line of work to larger backfill models, let’s talk.

Train setup

  • Base: Qwen/Qwen3-4B
  • Hardware: 1× H100
  • Epochs: 4 · Cosine schedule · warmup 40
  • Optimizer: adamw_bnb_8bit · lr 2.5e-5

Intended Uses

  • Dataset augmentation: generate process-supervision style traces for older instruction pairs.
  • Teacher bootstrapping: seed trace-rich examples to train or distill teachers.
  • Analysis tooling: produce rationales for audit of solution adherence.

Limitations

  • Traces are plausible reconstructions, not ground truth cognition.
  • Model can over-rationalize if solution is underspecified.

Prompting · ChatML

The model expects the input and output to be wrapped in <|instruction_start|><|instruction_end|> and <|solution_start><|solution_end|> tokens respectively. The model will should render the output in <|thinking_start|><|thinking_end|> tokens to help with parsing and identification of malformed outputs.

Example Prompt
<|im_start|>system
Your role as an assistant involves thoroughly reconstructing a plausible reasoning process that leads from a user-provided INSTRUCTION to a user-provided SOLUTION. You must not alter the SOLUTION. Each step should include concrete decisions and validations, such as: interpreting the INSTRUCTION, extracting constraints, selecting an approach, justifying key choices, verifying that intermediate results remain consistent with the provided SOLUTION, refining any errors, and a final consistency check noting any residual ambiguities. Use domain-appropriate specifics and avoid filler. Do not introduce new facts that would change the SOLUTION. Now, given an INSTRUCTION and a SOLUTION, reconstruct the Thought and present the Solution per the above guidelines. Your reasoning should begin and end with '<|thinking_start|>' and '<|thinking_end>' respectively.
<|im_end|>
<|im_start|>user
<|instruction_start|>
What are some tips for reducing stress at work? Your response should contain at least 4 bullet points.
Use markdown bullets like "* point". Include the keyword "mindfulness" twice.
<|instruction_end|>

<|solution_start|> ‐ Practice mindfulness during breaks ... ‐ Prioritize tasks and set boundaries ... ‐ Incorporate mindfulness into routine activities ... ‐ Stay physically active ... <|solution_end|> <|im_end|> <|im_start|>assistant <|thinking_start|>

Recommended Sampling

temperature: 0.7–1.0
top_p: 0.9
min_p: 0.05
max_tokens: as needed for trace length
stop: ["<|im_end|>"]

For tighter adherence, drop temperature toward 0.5–0.7.

Quantizations

  • GGUF: coming soon
Axolotl config
base_model: Qwen/Qwen3-4B
hub_model_id: joeyzero/Qwen3-4B-Reasoning-Backfill-V0.1
hf_use_auth_token: true

load_in_8bit: false
load_in_4bit: false
strict: false

gradient_accumulation_steps: 2
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 2.5e-5
max_grad_norm: 1.0
bf16: auto
tf32: false

datasets:
  - path: joeyzero/OpenThought-144k-Backfill-0.2
    type: chat_template
    field_messages: messages
  - path: joeyzero/dolphin-r1-backfill-0.0.2
    type: chat_template
    field_messages: messages

chat_template: chatml
dataset_prepared_path: prepared_data2
output_dir: ./thinking-backfill-0.1.17

sequence_len: 1024
sample_packing: true
pad_to_sequence_len: true
xformers_attention:
flash_attention: true
warmup_steps: 40
save_steps: 0.5

weight_decay: 0.02
wandb_project: reasoning-backfill
wandb_name: reasoning-backfill-attempt-04
Made by joeyzero· contributions and issues welcome.
Downloads last month
3
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(314)
this model

Datasets used to train joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1