Walrus: A Cross-Domain Foundation Model for Continuum Dynamics

Walrus is a large-scale physics foundation model capable of modeling a broad range of continuum dynamical systems.

Walrus is trained jointly across 19 diverse physical domains spanning:

astrophysics
geoscience
rheology
plasma physics
acoustics
classical fluids

These systems have diverse boundary conditions and physical parameterizations. The model is optimized to serve as a general-purpose surrogate for physical simulation and a strong initialization for downstream fine-tuning on new PDE systems.

Model Description

Walrus is a 1.3B-parameter space–time Transformer trained autoregressively to predict the temporal evolution of physical fields. Walrus is trained to model the evolution of physical systems in space and time. A simulation snapshot at time t is written as u(t).

We define the difference between two consecutive snapshots as: Δu(t+1) = u(t+1) − u(t)

Given a short history of snapshots: U(t) = [u(t − τ + 1), ..., u(t)]

The model predicts the next state using: u(t+1) ≈ u(t) + M(U(t))

Key architectural components

Adaptive-compute patch embedding
- Token count automatically balanced across resolutions
- Enables mixing 2D and 3D datasets efficiently
Patch Jittering
- A harmonic-analysis–motivated augmentation technique
- Reduces aliasing and spectral artifacts
- Improves long-horizon stability across 17/19 pretraining datasets
Tensor-law–aware data augmentation
- 2D data embedded into 3D through plane rotations
- Vector/tensor fields rotated with correct physical transformations
Asymmetric normalization
- Asymmetric normalization: Walrus normalizes inputs by RMS over space-time and de-normalizes the predicted Δu using the RMS of Δ.

Pretraining Details

Walrus is pretrained 19 physical datasets with:

Loss: Per-field normalized L1 loss
Optimizer: AdamW
Batching: System-uniform hierarchical sampling
Time-striding: Random stride (1–5) per training example
Patch jitter range: Uniform per-axis random offset
Dimensional unification: 2D fields embedded as thin 3D volumes

The model was pretrained on 96 NVIDIA H100 GPUs using distributed HSDP (4 GPU per shard group) with sampling matching distribution structure for minimal deadweight loss.

Intended Use

This pretrained checkpoint is suitable for:

✔ Next-step prediction

✔ Fast surrogate simulation

✔ Autoregressive rollout of physical systems

✔ Transfer learning to new physical settings

Resources

Paper: https://arxiv.org/pdf/2511.15684
Github: https://github.com/PolymathicAI/walrus
Tutorial: https://github.com/PolymathicAI/walrus/demo_notebooks

Note, the training code in the repository is closely coupled with tools from the Well, so it can be beneficial to format data to match that schema. If that's not possible, the tutorial does show how one would use the model without Well-formatted data.

Demonstrated downstream tasks

We show the strong performance of Walrus by finetuning on a range of challenging downstream tasks as shown in the paper. Paths to access the finetuned walrus checkpoints for various downstream tasks is as follows: