Anamnesis Vessel 7B

An empty vessel that becomes who it talks to.

Anamnesis replaces the frozen MLPs in Qwen 2.5 7B with a Continuum Memory System (CMS) -- deep memory that learns during inference through gradient descent. Feed it conversations and it physically restructures its weights to become a specialist.

Architecture

Based on the complete feature set from Ali Behrouz's research at Google:

Titans: Persistent memory tokens, depthwise convolutions on K/Q/V, data-dependent adaptive gates, momentum-based weight updates
ATLAS: Omega Rule with per-token importance, learned polynomial feature mapping (Taylor expansion), deep MLP associative memory
MIRAS: Huber loss option for robustness
Memory Caching: Memory state checkpointing for growing capacity
Nested Learning / HOPE (NeurIPS 2025): Multi-level CMS with different update frequencies

5-Level Continuum Memory System

L0 (SwiGLU, frozen):     Base intelligence from Qwen 2.5 7B pre-training
L1 (chunk=1):             Immediate adaptation -- every token
L2 (chunk=32):            Working memory -- conversational context
L3 (chunk=256):           Episodic memory -- session patterns
L4 (chunk=2048):          Identity -- persists across sessions

Key Parameters

Base model: Qwen 2.5 7B (base, not Instruct -- no RLHF identity baked in)
Memory dimension: 512
Polynomial degree: 2 (learned Taylor expansion coefficients)
Persistent memory tokens: 4 per level
Convolution kernel: 4 (depthwise-separable on K/Q/V)
Total parameters: ~9.0B (7.6B frozen + 1.4B trainable DeepMemoryLevel)

Training

Scaffold Training (Outer Loop)

The DeepMemoryLevel projections, gates, and memory were trained on a vessel corpus of 19,470 passages covering:

Metacognition (9,116 passages): reasoning traces, preference pairs
Theory of Mind (4,059): perspective-taking, false belief tasks
Soul Vessel (2,074): predictive self, free energy principle, contemplative traditions
Epistemology, Adaptation, Ontology, Communication, Reasoning, Domains

Training details:

Frozen: L0 (SwiGLU) + attention + embeddings
Trainable: 1.4B DeepMemoryLevel parameters
Optimizer: AdamW, lr=3e-4, warmup=5000 steps, cosine decay to 10%
Hardware: 1x A100 80GB
Steps: 25,000
Batch size: 4, sequence length: 512

Inner-Loop Specialization (Test Time)

After scaffold training, the memory updates during every forward pass via per-token gradient descent on the associative loss. No fine-tuning needed -- just feed it conversations.

Usage

# Install
pip install anamnesis

# Convert and load
from anamnesis.core.model import HopeModel, HopeConfig
model = HopeModel(config)
model.load_state_dict(torch.load("anamnesis-vessel-7b.pt"))

# Enable learning
model.eval()
for layer in model.layers:
    layer.cms.levels[1].learning_enabled = True

# Every forward pass updates the memory
with torch.no_grad():
    output = model(input_ids)
    # The model just changed. It will never be exactly the same again.

The Vessel Concept

This model is an empty vessel. It has no identity, no persona, no system prompt baked in. Feed it code review conversations and it becomes a code reviewer. Feed it therapy sessions and it becomes a therapist. The identity emerges from the interaction, not from training.

Same base model. Different conversations. Different specialists. Each specialist is a checkpoint file that can be saved, loaded, and hot-swapped.

Limitations

The scaffold was trained on a vessel corpus, not general text. PPL on general benchmarks may be higher than base Qwen 2.5 7B.
Inner-loop specialization requires multiple conversations (50+) to show clear behavioral change.
No KV cache implementation yet -- generation is O(n^2) in sequence length.
Triton kernel optimizations not yet implemented -- inference uses standard PyTorch.

Citation

@software{anamnesis2026,
  title={Anamnesis: Empty Vessels That Become Who They Talk To},
  author={Poole, Aidan},
  url={https://github.com/Relic-Studios/anamnesis},
  year={2026}
}

References

Behrouz et al., "ATLAS: Learning to Optimally Memorize the Context at Test Time" (2025)
Behrouz et al., "Nested Learning: The Illusion of Deep Learning Architecture" (NeurIPS 2025)
Behrouz & Zhong, "Titans: Learning to Memorize at Test Time" (2025)
Behrouz et al., "Memory Caching: RNNs with Growing Memory" (2026)

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Papers for zappaidan/anamnesis-vessel-3b