Walrus: A Cross-Domain Foundation Model for Continuum Dynamics
Walrus is a large-scale physics foundation model capable of modeling a broad range of continuum dynamical systems.
Walrus is trained jointly across 19 diverse physical domains spanning:
- astrophysics
- geoscience
- rheology
- plasma physics
- acoustics
- classical fluids
These systems have diverse boundary conditions and physical parameterizations. The model is optimized to serve as a general-purpose surrogate for physical simulation and a strong initialization for downstream fine-tuning on new PDE systems.
Model Description
Walrus is a 1.3B-parameter spaceβtime Transformer trained autoregressively to predict the temporal evolution of physical fields. Walrus is trained to model the evolution of physical systems in space and time. A simulation snapshot at time t is written as u(t).
We define the difference between two consecutive snapshots as: Ξu(t+1) = u(t+1) β u(t)
Given a short history of snapshots: U(t) = [u(t β Ο + 1), ..., u(t)]
The model predicts the next state using: u(t+1) β u(t) + M(U(t))
Key architectural components
Adaptive-compute patch embedding
- Token count automatically balanced across resolutions
- Enables mixing 2D and 3D datasets efficiently
Patch Jittering
- A harmonic-analysisβmotivated augmentation technique
- Reduces aliasing and spectral artifacts
- Improves long-horizon stability across 17/19 pretraining datasets
Tensor-lawβaware data augmentation
- 2D data embedded into 3D through plane rotations
- Vector/tensor fields rotated with correct physical transformations
Asymmetric normalization
- Asymmetric normalization: Walrus normalizes inputs by RMS over space-time and de-normalizes the predicted Ξu using the RMS of Ξ.
Pretraining Details
Walrus is pretrained 19 physical datasets with:
- Loss: Per-field normalized L1 loss
- Optimizer: AdamW
- Batching: System-uniform hierarchical sampling
- Time-striding: Random stride (1β5) per training example
- Patch jitter range: Uniform per-axis random offset
- Dimensional unification: 2D fields embedded as thin 3D volumes
The model was pretrained on 96 NVIDIA H100 GPUs using distributed HSDP (4 GPU per shard group) with sampling matching distribution structure for minimal deadweight loss.
Intended Use
This pretrained checkpoint is suitable for:
β Next-step prediction
β Fast surrogate simulation
β Autoregressive rollout of physical systems
β Transfer learning to new physical settings
Resources
Paper: https://arxiv.org/pdf/2511.15684
Github: https://github.com/PolymathicAI/walrus
Tutorial: https://github.com/PolymathicAI/walrus/demo_notebooks
Note, the training code in the repository is closely coupled with tools from the Well, so it can be beneficial to format data to match that schema. If that's not possible, the tutorial does show how one would use the model without Well-formatted data.
Demonstrated downstream tasks
We show the strong performance of Walrus by finetuning on a range of challenging downstream tasks as shown in the paper. Paths to access the finetuned walrus checkpoints for various downstream tasks is as follows:
PDEGym CE-RM: https://huggingface.co/polymathic-ai/walrus_ft_CE-RM/tree/main
PDEBench CNS Turbulent: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_64_Turb/tree/main
PDEBench CNS Random: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_128_Rand/tree/main
Flowbench FPOSkelenton: https://huggingface.co/polymathic-ai/walrus_ft_flowbench_skelenton/tree/main
The Well Postmerger Neutron Star: https://huggingface.co/polymathic-ai/walrus_ft_post_neutron_star_merger/tree/main
The Well Convective envelope RSG: https://huggingface.co/polymathic-ai/walrus_ft_convective_envelope_rsg/tree/main
PDEArena Conditioned Incompressible NS: https://huggingface.co/polymathic-ai/walrus_ft_pdearena_ins/tree/main
BubbleML 2.0 PoolBoil Subcooled: https://huggingface.co/polymathic-ai/walrus_ft_bubbleML_poolboil/tree/main
Additional checkpoints not included in the Walrus collection on HF can be found here though the endpoint is a bit finicky.
More finetuning checkpoints will continue to be added to HF over time.