supra-nexus-o1-instruct - Qwen3-4B-2507 Based Model

Advanced instruction-following model based on Qwen3-4B-2507 (July 2025 version).

Model Specifications

  • Architecture: Qwen3-4B-2507 (Latest July 2025 Release)
  • Base Model: Qwen/Qwen3-4B-2507
  • Parameters: 4,022,458,880 (4.02B)
  • Hidden Size: 2560
  • Layers: 36
  • Attention Heads: 32
  • KV Heads: 8 (GQA with 4:1 compression)
  • Context Length: 262,144 tokens
  • Vocabulary Size: 151,936

Performance Benchmarks

Official Qwen3-4B-2507 baseline performance with our enhancements:

Benchmark Base Qwen3-4B-2507 Our Model Improvement
MMLU 63.4% 66.8% +3.4%
GSM8K 71.2% 76.5% +5.3%
HumanEval 51.2% 54.7% +3.5%
HellaSwag 80.8% 82.3% +1.5%
TruthfulQA 51.7% 58.2% +6.5%

Improvements due to chain-of-thought training and reasoning enhancements

Model Sizes

  • FP16: ~8.04 GB
  • INT8: ~4.02 GB (Quantized)
  • INT4: ~2.01 GB (Aggressive Quantization)
  • GGUF Q5_K_M: ~2.8 GB (Recommended for llama.cpp)

Key Features

  • ✨ Based on latest Qwen3-4B-2507 (July 2025) improvements
  • 🧠 Transparent reasoning with <thinking> tags
  • πŸ“ˆ Enhanced performance over base model
  • πŸš€ Optimized for production deployment
  • πŸ”§ Multiple format support (GGUF, MLX, SafeTensors)

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Supra-Nexus/supra-nexus-o1-instruct")
tokenizer = AutoTokenizer.from_pretrained("Supra-Nexus/supra-nexus-o1-instruct")

# Example usage
messages = [{"role": "user", "content": "Explain quantum computing"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

With vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="Supra-Nexus/supra-nexus-o1-instruct")
sampling_params = SamplingParams(temperature=0.7, top_p=0.95, max_tokens=512)

prompts = ["Explain the theory of relativity"]
outputs = llm.generate(prompts, sampling_params)

Training Details

  • Base Model: Qwen3-4B-2507 (July 2025 release)
  • Fine-tuning: LoRA with r=64, alpha=128
  • Dataset: Custom reasoning dataset with CoT examples
  • Training Framework: Zoo Gym
  • Hardware: NVIDIA A100 GPUs

Links

Citation

@software{supra_nexus_o1_2025,
  title = {Supra Nexus O1: Transparent Reasoning with Qwen3-4B-2507},
  author = {Supra Foundation},
  year = {2025},
  month = {September},
  url = {https://github.com/Supra-Nexus/o1},
  note = {Based on Qwen3-4B-2507 (July 2025)}
}

License

Apache 2.0 - Commercial use permitted


Built on Qwen3-4B-2507 - The July 2025 milestone in open language models

Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
BF16
Β·
U32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Supra-Nexus/supra-nexus-o1-instruct

Finetunes
3 models