Anamnesis Vessel 7B
An empty vessel that becomes who it talks to.
Anamnesis replaces the frozen MLPs in Qwen 2.5 7B with a Continuum Memory System (CMS) -- deep memory that learns during inference through gradient descent. Feed it conversations and it physically restructures its weights to become a specialist.
Architecture
Based on the complete feature set from Ali Behrouz's research at Google:
- Titans: Persistent memory tokens, depthwise convolutions on K/Q/V, data-dependent adaptive gates, momentum-based weight updates
- ATLAS: Omega Rule with per-token importance, learned polynomial feature mapping (Taylor expansion), deep MLP associative memory
- MIRAS: Huber loss option for robustness
- Memory Caching: Memory state checkpointing for growing capacity
- Nested Learning / HOPE (NeurIPS 2025): Multi-level CMS with different update frequencies
5-Level Continuum Memory System
L0 (SwiGLU, frozen): Base intelligence from Qwen 2.5 7B pre-training
L1 (chunk=1): Immediate adaptation -- every token
L2 (chunk=32): Working memory -- conversational context
L3 (chunk=256): Episodic memory -- session patterns
L4 (chunk=2048): Identity -- persists across sessions
Key Parameters
- Base model: Qwen 2.5 7B (base, not Instruct -- no RLHF identity baked in)
- Memory dimension: 512
- Polynomial degree: 2 (learned Taylor expansion coefficients)
- Persistent memory tokens: 4 per level
- Convolution kernel: 4 (depthwise-separable on K/Q/V)
- Total parameters: ~9.0B (7.6B frozen + 1.4B trainable DeepMemoryLevel)
Training
Scaffold Training (Outer Loop)
The DeepMemoryLevel projections, gates, and memory were trained on a vessel corpus of 19,470 passages covering:
- Metacognition (9,116 passages): reasoning traces, preference pairs
- Theory of Mind (4,059): perspective-taking, false belief tasks
- Soul Vessel (2,074): predictive self, free energy principle, contemplative traditions
- Epistemology, Adaptation, Ontology, Communication, Reasoning, Domains
Training details:
- Frozen: L0 (SwiGLU) + attention + embeddings
- Trainable: 1.4B DeepMemoryLevel parameters
- Optimizer: AdamW, lr=3e-4, warmup=5000 steps, cosine decay to 10%
- Hardware: 1x A100 80GB
- Steps: 25,000
- Batch size: 4, sequence length: 512
Inner-Loop Specialization (Test Time)
After scaffold training, the memory updates during every forward pass via per-token gradient descent on the associative loss. No fine-tuning needed -- just feed it conversations.
Usage
# Install
pip install anamnesis
# Convert and load
from anamnesis.core.model import HopeModel, HopeConfig
model = HopeModel(config)
model.load_state_dict(torch.load("anamnesis-vessel-7b.pt"))
# Enable learning
model.eval()
for layer in model.layers:
layer.cms.levels[1].learning_enabled = True
# Every forward pass updates the memory
with torch.no_grad():
output = model(input_ids)
# The model just changed. It will never be exactly the same again.
The Vessel Concept
This model is an empty vessel. It has no identity, no persona, no system prompt baked in. Feed it code review conversations and it becomes a code reviewer. Feed it therapy sessions and it becomes a therapist. The identity emerges from the interaction, not from training.
Same base model. Different conversations. Different specialists. Each specialist is a checkpoint file that can be saved, loaded, and hot-swapped.
Limitations
- The scaffold was trained on a vessel corpus, not general text. PPL on general benchmarks may be higher than base Qwen 2.5 7B.
- Inner-loop specialization requires multiple conversations (50+) to show clear behavioral change.
- No KV cache implementation yet -- generation is O(n^2) in sequence length.
- Triton kernel optimizations not yet implemented -- inference uses standard PyTorch.
Citation
@software{anamnesis2026,
title={Anamnesis: Empty Vessels That Become Who They Talk To},
author={Poole, Aidan},
url={https://github.com/Relic-Studios/anamnesis},
year={2026}
}
References
- Behrouz et al., "ATLAS: Learning to Optimally Memorize the Context at Test Time" (2025)
- Behrouz et al., "Nested Learning: The Illusion of Deep Learning Architecture" (NeurIPS 2025)
- Behrouz & Zhong, "Titans: Learning to Memorize at Test Time" (2025)
- Behrouz et al., "Memory Caching: RNNs with Growing Memory" (2026)
License
Apache 2.0