HRM Maze 30x30 Hard

A Hierarchical Reasoning Model (HRM) trained to solve hard 30ร—30 maze navigation problems using hierarchical processing and adaptive computation.

Model Details

Model Description

This is a Hierarchical Reasoning Model checkpoint fine-tuned specifically for solving hard maze pathfinding problems on 30ร—30 grids. The model employs a two-level hierarchical architecture inspired by human cognition, with high-level (H) modules for abstract route planning and low-level (L) modules for detailed navigation decisions. It uses Adaptive Computation Time (ACT) with Q-learning based halting to dynamically allocate computational resources.

The model processes maze grids up to 30ร—30 (900 tokens) and predicts optimal navigation paths through complex maze environments.

  • Developed by: Sapient Inc.
  • Model type: Hierarchical Reasoning Model (HRM)
  • Language(s): Symbolic reasoning (maze navigation symbols)
  • License: Apache 2.0
  • Original checkpoint: sapientinc/HRM-checkpoint-maze-30x30-hard

Model Sources

Uses

Direct Use

This model is designed for solving hard maze navigation problems. It can:

  • Find optimal paths through complex 30ร—30 maze environments
  • Navigate mazes with multiple obstacles and dead ends
  • Process partial maze representations and predict navigation sequences
  • Demonstrate hierarchical planning strategies for spatial reasoning tasks

Downstream Use

The model can be used as:

  • A component in game AI and procedural content generation
  • A baseline for research in hierarchical spatial reasoning
  • An example of applying neural networks to pathfinding and navigation problems
  • A planning module in robotics and autonomous navigation research

Recommendations

Users should be aware that:

  • The model is specialized for maze pathfinding and should not be used for general spatial reasoning tasks
  • Input must be properly formatted as grid representations with the 6-token vocabulary
  • Inference time may vary due to the adaptive computation mechanism
  • The model is optimized for hard difficulty mazes and may be over-engineered for simple mazes

How to Get Started with the Model

import torch
from transformers import HrmForCausalLM

# Load the model
model = HrmForCausalLM.from_pretrained("zbloss/HRM-maze-30x30-hard")
model.eval()

# Prepare a maze grid (e.g., 20x20 = 400 tokens)
# Vocabulary: 0-5 representing different maze elements
# (e.g., 0=empty, 1=wall, 2=start, 3=goal, 4=path, 5=visited)
maze_grid = torch.randint(0, 6, (1, 400))  # Example 20x20 maze
puzzle_ids = torch.zeros(1, dtype=torch.long)

# Run inference
with torch.no_grad():
    outputs = model(input_ids=maze_grid, puzzle_identifiers=puzzle_ids)

# Get predictions
predictions = torch.argmax(outputs.logits, dim=-1)
print(f"Predicted navigation path: {predictions}")
print(f"Q-halt: {outputs.q_halt_logits[0]:.4f}")
print(f"Q-continue: {outputs.q_continue_logits[0]:.4f}")

Training Details

Training Data

The model was trained on a dataset of hard difficulty 30ร—30 maze environments. These mazes feature:

  • Complex layouts with multiple branching paths
  • Dead ends requiring backtracking
  • Long optimal paths requiring multi-step planning
  • Variable start and goal positions

Training Procedure

The model uses a hierarchical architecture with:

  • High-level (H) module: 4 transformer layers for abstract route planning
  • Low-level (L) module: 4 transformer layers for detailed navigation decisions
  • H-cycles: 2 high-level reasoning cycles for strategic planning
  • L-cycles: 2 low-level computation cycles per H-cycle for tactical moves
  • ACT mechanism: Q-learning based adaptive halting with max 16 steps

Training Hyperparameters

  • Training regime: bfloat16 mixed precision
  • Architecture: 4 H-layers, 4 L-layers, 8 attention heads
  • Hidden size: 512
  • Intermediate size: 1536
  • Max position embeddings: 900 (supports up to 30ร—30 grids)
  • Vocabulary size: 6 (maze navigation symbols)

Model Architecture

Technical Specifications

Component Value
Total Parameters 27,270,658 (27.3M)
Model Size 109.09 MB
Vocabulary Size 6
Hidden Size 512
Intermediate Size 1536
H-level Layers 4
L-level Layers 4
Attention Heads 8
H-cycles 2
L-cycles 2
Max Halting Steps 16
Max Grid Size 30ร—30 (900 tokens)
Position Encoding RoPE (Rotary Position Embeddings)
Activation SwiGLU

Model Architecture and Objective

The Hierarchical Reasoning Model (HRM) features:

  1. Two-level Hierarchical Processing:

    • H-level (High-level): Performs slow, abstract route planning and strategic navigation
    • L-level (Low-level): Executes fast, detailed navigation decisions and obstacle avoidance
  2. Adaptive Computation Time (ACT):

    • Q-learning based halting mechanism
    • Dynamically determines when sufficient computation has been performed
    • Allows variable computational depth based on maze complexity
    • More complex mazes with longer paths trigger more reasoning cycles
  3. Recurrent Carry State:

    • Maintains H and L hidden states across reasoning cycles
    • Enables iterative refinement of navigation strategies
    • Supports backtracking and path correction
  4. Positional Encoding:

    • RoPE (Rotary Position Embeddings) for position-aware attention
    • Critical for spatial reasoning in grid-based environments
    • Supports up to 900 positions (30ร—30 grids)

Compute Infrastructure

Software

  • Framework: PyTorch with transformers library
  • Precision: bfloat16
  • Format: Safetensors

Performance

The model is designed to solve hard difficulty mazes on 30ร—30 grids, demonstrating:

  • Multi-step planning capabilities for long navigation sequences
  • Ability to recognize and avoid dead ends
  • Strategic backtracking when necessary
  • Hierarchical decomposition of complex navigation problems

Citation

BibTeX:

@article{wang2025hierarchical,
  title={Hierarchical Reasoning Model},
  author={Wang, Guan and Li, Jin and Sun, Yuhao and Chen, Xing and Liu, Changling and Wu, Yue and Lu, Meng and Song, Sen and Yadkori, Yasin Abbasi},
  journal={arXiv preprint arXiv:2506.21734},
  year={2025}
}

APA:

Wang, G., Li, J., Sun, Y., Chen, X., Liu, C., Wu, Y., Lu, M., Song, S., & Yadkori, Y. A. (2025). Hierarchical Reasoning Model. arXiv preprint arXiv:2506.21734.

More Information

This checkpoint is a converted version of the original HRM checkpoint from sapientinc/HRM-checkpoint-maze-30x30-hard, formatted for use with the HuggingFace transformers library.

For more details about the HRM architecture and training methodology, see:

Example Use Cases

  1. Game AI: Intelligent maze navigation in video games
  2. Path Planning Research: Baseline for hierarchical planning algorithms
  3. Robotics: Inspiration for hierarchical navigation strategies
  4. Education: Demonstrating neural approaches to classic AI problems

Model Card Contact

For questions or issues with this converted checkpoint, please open an issue in the transformers repository.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support