Darwin-4B-Genesis
World's first Transformer ร Mamba evolutionary cross-architecture FFN breeding | CLIcK 92% | MuSR 70% | A 4B model outperforming 27B | CMA-ES 42-dimensional genome search | Hybrid Vigor demonstrated | Apache 2.0
What Is This?
Darwin-4B-Genesis is the 3rd generation Darwin model and the world's first model to successfully crossbreed FFN layers across different architectures โ Transformer (Gemma4) and Mamba (Qwen3.5 GatedDeltaNet) โ using evolutionary optimization.
The father's Attention layers (Gemma4 Transformer) are preserved at 100%, while the mother's FFN knowledge (Qwen3.5 Mamba) is transplanted at layer-specific optimal ratios discovered automatically by CMA-ES across 42 dimensions.
The result: the child outperforms both parents on every benchmark โ a phenomenon known as Hybrid Vigor.
Why This Matters
1. World First
Existing hybrid models (Jamba, Nemotron-H, Granite 4.0) are all designed and trained from scratch. Darwin-4B-Genesis takes two already-trained models from different architecture families and breeds them evolutionarily โ with zero additional training.
2. Hybrid Vigor Demonstrated
| Benchmark | David (Father) | Qwen3.5-4B (Mother) | Genesis (Child) |
|---|---|---|---|
| CLIcK | 90% | ~50% (est.) | 92% โ |
| MuSR | 65% | ~55% (est.) | 70% โ |
The child surpasses both parents. This is the first demonstration of Hybrid Vigor in AI model breeding.
3. Manual vs Evolution
| Method | CLIcK | MuSR |
|---|---|---|
| Manual 50% blend | ~23% | โ |
| Manual 30% selective blend | 62% | 45% |
| CMA-ES 42D automatic search | 92% | 70% |
Human-chosen ratios fail. Evolutionary search succeeds.
Benchmarks
| Benchmark | Genesis | David (Gen2) | K-AI #1 (27B) |
|---|---|---|---|
| CLIcK (Korean culture) | 92% | 90% | 0.794 |
| MuSR (multi-step reasoning) | 70% | 65% | 0.604 |
| GPQA (deep reasoning) | ~60% | ~60% | โ |
A 4B model dominates the K-AI leaderboard's #1 model (27B) on both CLIcK and MuSR.
How It Works
Cross-Architecture FFN Breeding
Father: Darwin-4B-David (Gemma4 Transformer, hidden=2560, 42 layers)
Mother: Qwen/Qwen3.5-4B (GatedDeltaNet/Mamba, hidden=2560, 32 layers)
Key insight: hidden_size matches (2560) โ direct FFN replacement possible
Method: Attention 100% from Father, FFN blended at per-layer optimal ratios
Optimizer: CMA-ES (Covariance Matrix Adaptation Evolution Strategy)
Genome: 42 dimensions (one ratio per layer)
Fitness: CLIcK 60% + MuSR 40% composite score
Frozen layers: L15, L16, L22, L23, L24, L25 (Korean language preservation)
Optimal Genome Discovered by CMA-ES
L00: 0.206 โโโโโโโโโโโ 21% Qwen
L07: 0.000 โโโโโโโโโโโ Auto-protected by CMA-ES
L15: 0.000 โโโโโโโโโโโ Frozen (Korean)
L22: 0.000 โโโโโโโโโโโ Frozen (Korean)
L29: 0.291 โโโโโโโโโโโโโโโ 29% Qwen (maximum)
L31: 0.244 โโโโโโโโโโโโโ 24% Qwen
L32: 0.273 โโโโโโโโโโโโโโ 27% Qwen
Key finding: CMA-ES applied the most aggressive Qwen blending to the final layers (L29-32), which govern output quality. The algorithm determined that "Qwen's generation quality exceeds Darwin's" for those specific layers โ while simultaneously protecting critical layers (L7, L18, L28) by driving their ratios to zero.
Training Cost
| This Model | Typical Hybrid | |
|---|---|---|
| GPU | H100 ร 1 | Hundreds to thousands |
| Time | 155 minutes | Weeks to months |
| Training data | 0 tokens | Trillions of tokens |
| Training compute | Fitness evaluation only | Full pre-training |
Genealogy
google/gemma-4-E4B-it ร TeichAI/Claude-Opus-Distill-E4B
โ Darwin-4B-Opus (Gen 1, DARE-TIES merge)
Darwin-4B-Opus ร DavidAU/DECKARD-Expresso-Universe
โ Darwin-4B-David (Gen 2, MRI-guided merge, CLIcK 90%)
Darwin-4B-David ร Qwen/Qwen3.5-4B
โ Darwin-4B-Genesis (Gen 3, Cross-Arch FFN Breeding, CLIcK 92%) โ
DNA Composition
Gemma4 Transformer (skeleton, Attention) ~50%
Claude Opus Distill (reasoning patterns) ~20%
DECKARD Universe (Korean, creativity) ~15%
Qwen3.5 GatedDeltaNet (Mamba FFN) ~15%
What Is FFN Breeding?
AI models have two main components:
- Attention = the brain (decides what to focus on, reasoning chains)
- FFN = the muscles (stores knowledge, processes patterns)
Darwin-4B-Genesis keeps the brain from the father (Transformer) and blends in muscles from the mother (Mamba) at optimal ratios. As long as the FFN input/output dimensions match (hidden_size=2560), the swap works โ like a USB-C port that accepts any compatible charger.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained(
"FINAL-Bench/Darwin-4B-Genesis",
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
"FINAL-Bench/Darwin-4B-Genesis",
dtype="bfloat16",
device_map="auto",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "Explain how hybrid vigor works in genetics."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True))
Hardware Requirements
| Setup | VRAM | Status |
|---|---|---|
| NVIDIA RTX 4090 (24GB) | 24 GB | BF16 fits |
| NVIDIA RTX 3090 (24GB) | 24 GB | BF16 fits |
| NVIDIA H100 (93GB) | 93 GB | Comfortable |
| Mac M3 Max (36GB) | 36 GB | Comfortable |
Dense 4B model โ runs on a single consumer GPU.
Model Specifications
| Architecture | Gemma4 Dense (Transformer Attention + Mamba FFN hybrid) |
| Effective Parameters | 4B (8B total with PLE) |
| Hidden Size | 2560 |
| Intermediate Size | 10240 |
| Layers | 42 |
| Context Length | 32,768 |
| License | Apache 2.0 |
How This Differs from Prior Work
| Existing Hybrids | Darwin-4B-Genesis | |
|---|---|---|
| Examples | Jamba, Nemotron-H, Granite 4.0 | This model |
| Method | Design โ train from scratch | Breed trained models โ zero training |
| Cost | Thousands of GPUยทhours | H100 ร 1, 2.6 hours |
| Data | Trillions of tokens | 0 tokens (fitness eval only) |
| Ratio selection | Manual architecture design | CMA-ES 42D automatic search |
| Hybrid Vigor | Not tested | Benchmarked and confirmed |
Future Work
- Cross-breeding with RWKV-7, xLSTM, and other architectures
- Scaling to 31B/35B models with the same technique
- Paper: "Cross-Architecture FFN Breeding with Evolutionary Optimization"
- Patents: Methods for selective FFN transplantation across architectures
Acknowledgements
- Korean Government โ GPU Support Program research grant
- Google โ Gemma4 E4B architecture
- Alibaba Qwen Team โ Qwen3.5-4B GatedDeltaNet
- TeichAI โ Claude Opus Distill model
- DavidAU โ DECKARD-Expresso-Universe model
- Jackrong โ Claude 4.6 Opus Reasoning Distilled
Citation
@misc{vidraft_darwin_4b_genesis,
title = {Darwin-4B-Genesis: World's First Cross-Architecture FFN Breeding},
author = {VIDRAFT},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-4B-Genesis}}
}
- Downloads last month
- 41
Model tree for FINAL-Bench/Darwin-4B-Genesis
Collection including FINAL-Bench/Darwin-4B-Genesis
Evaluation results
- Accuracy on CLIcKself-reported92.000
- Accuracy on MuSRself-reported70.000