HRM-MoE: Hierarchical Recurrent Memory with Mixture of Experts

HRM-MoE is an experimental language model that combines:

  • Hierarchical Recurrent Memory (HRM) architecture for deep reasoning
  • Mixture of Experts (MoE) for efficient scaling

Model Description

This model integrates HRM's Specialist/Manager hierarchy into a Mixture of Experts framework, allowing different experts to specialize in various aspects of language understanding.

Architecture

  • Model Size: 228,352,512 parameters
    • Expert Parameters: 204,506,112
    • Non-Expert Parameters: 23,846,400
  • Embedding Dimension: 512
  • Layers: 6
  • Attention Heads: 8
  • FFN Dimension: 2048
  • Number of Experts: 8
  • Experts per Token: 2

Expert Types

  1. GLU/GEGLU Experts (4): Standard gated linear units
  2. Pattern Experts (2): Deep FFN for pattern recognition
  3. Local Conv Experts (1): Local neighborhood operations
  4. HRM Experts (1): Hierarchical reasoning with Specialist/Manager modules

Training

  • Dataset: wikimedia/wikipedia
  • Training Epochs: 1
  • Batch Size: 8 (effective: 32)
  • Learning Rate: 5e-05 โ†’ 1e-06 (cosine)
  • Mixed Precision: Enabled

Latest Performance (Epoch 0)

  • Validation Loss: 6.3832
  • Validation Perplexity: 591.83

Features

  • Adaptive Routing: Gumbel-Softmax with temperature annealing
  • Load Balancing: Importance, load, and entropy regularization
  • Expert Specialization: Diverse expert types for different aspects of language

Usage

import torch
from transformers import T5Tokenizer

# Load tokenizer
tokenizer = T5Tokenizer.from_pretrained("t5-small", use_fast=False)

# Load model (you'll need the model architecture from the repo)
# See: https://github.com/your-repo/hrm-moe

# Generate text
# (example code here)

Citation

If you use this model, please cite the original HRM paper:

@article{hrm2024,
  title={Hierarchical Reasoning Model},
  author={...},
  journal={arXiv preprint},
  year={2024}
}

License

Apache 2.0


๐Ÿค– Generated with HRM-MoE Training Script

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support