QTM7-4B

QTM7-4B is a proof-of-concept math & code reasoning model, trained briefly from Qwen/Qwen3-4B-Base.
It was finetuned for ~4 hours on a single A100 GPU, using lightweight datasets focused on mathematical reasoning and structured problem solving.
This project demonstrates what can be achieved on minimal compute/budget (≈$20 total cost).

UPDATE: Observed Performance Shift

This model was explicitly trained using math and code datasets with the intent of achieving higher performance in structured reasoning compared to the base Qwen3-4B model. While quantitative GSM8K metrics show improved math ability, recent qualitative testing suggests an unexpected side effect:

QTM7-4B exhibits significantly enhanced performance in creative writing, narrative generation, and descriptive tasks compared to the Qwen3-4B base model.

The model appears to have utilized the focused finetuning to better understand complex instruction following and structure, which has translated into a superior ability to generate cohesive and evocative creative content.

Model Details

Developed by: Independent researcher (solo project)
Funding: Self-funded (~$20 total compute cost)
Model type: Decoder-only transformer for text generation
Language(s): English
License: Apache-2.0
Finetuned from: Qwen/Qwen3-4B-Base

Sources

Repository: Ma7ee7/QTM7-4b-2hr-checkpoint

Uses

Direct Use

Research into math & code reasoning
Proof-of-concept for low-budget finetuning on large language models
New Focus: Evaluation of low-resource impact on creative writing and narrative coherence.

Downstream Use

Potential basis for math problem solvers or code reasoning assistants
Experiments in lightweight alignment or evaluation pipelines

Out-of-Scope

Not suitable for safety-critical, legal, or medical applications
Not RLHF-aligned; outputs may be unfiltered or ungrounded

Bias, Risks, and Limitations

Inherits biases from Qwen3-4B-Base
Untested on broader NLP benchmarks (MMLU, ARC, etc.)
Training was short (~2 hours net, ~4 GPU hours total), so coverage is shallow
General conversational ability remains base-model level

Recommendation: Treat outputs as experimental. Do not deploy in production or decision-making contexts.

Training Details

Training Data

unsloth/OpenMathReasoning-mini — math reasoning dataset
nvidia/OpenCodeReasoning — code reasoning tasks
No GSM8K contamination was found in either the training or post-training data.

Procedure

Mixed precision: fp16
Optimizer: AdamW (standard defaults)
Duration: ~4 hours on 1x NVIDIA A100
Checkpoint size: ~16 GB (fp16)

Evaluation

Setup

Compared against Qwen/Qwen3-4B (post-trained version)
Dataset: GSM8K test split (subset of 300 “hard” problems)
Metrics: Exact match on final numeric answer

Results

Training Loss Curve
Stable convergence toward ~0.63 by step 1750, even as difficulty increased.

GSM8K Accuracy (Sampled)
QTM7-4B* scored ~80.7% vs Qwen3-4B’s ~28.0%.

Head-to-Head Outcomes
QTM7-4B* won most direct comparisons.

Only QTM7-4B* correct → 171
Both correct → 71
Both wrong → 45
Only Qwen correct → 13

Outcome Breakdown by Model (GSM8K subset)
Side-by-side percentages for correctness vs error types.

QTM7-4B*: 80.7% correct, 7.3% mismatch, 12.0% truncated
Qwen3-4B: 28.0% correct, 72.0% mismatch, 0% truncated

* QTM7-4B = 2hr checkpoint

Environmental Impact

Estimated using MLCO2 Impact Calculator:

Hardware: NVIDIA A100 (80GB)
GPU hours: ~4
Cloud Provider: Google Colab (us-central assumed)
Carbon emitted: ≈ 1.2 kg CO2eq

(About the same as driving ~5 km in a gasoline car.)

Technical Specifications

Architecture: Qwen3-4B transformer (4B params, decoder-only, rotary embeddings, SwiGLU, grouped query attention)
Objective: Causal LM finetuning on reasoning tasks
Software: PyTorch + Hugging Face Transformers + Datasets

Summary

QTM7-4B is a minimal-budget proof-of-concept showing that:

Small compute can still move the needle on reasoning with focused datasets.
Math reasoning gains were observed even with short finetunes.
The model is not benchmarked broadly, but shows promise as a low-resource experiment.

Downloads last month: 11

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Ma7ee7/QTM7-4b-1771

Base model

Qwen/Qwen3-4B-Base

Finetuned

(158)

this model

Quantizations

1 model

Ma7ee7
/

QTM7-4b-1771