FAMA: A Scalable Foundational Astronomical Masked Autoencoder

FAMA (Foundational Astronomical Masked Autoencoder) is a self-supervised, foundational image model based on the Masked Autoencoder (MAE) architecture, optimized for the unique properties of astronomical data. It is designed to overcome the challenge of heterogeneous, unlabelled image datasets accumulating from wide-field surveys like the DESI Legacy Imaging Surveys and the upcoming Chinese Space Station Telescope (CSST).

The model achieves robust, generalized feature extraction by pre-training the Vision Transformer (ViT) encoder using a high-ratio masking strategy.

πŸ’‘ Key Results and Highlights

  • Superior Performance: FAMA yields significant performance gains over supervised baselines in downstream tasks like galaxy classification, object detection, and redshift estimation.
  • Robust Transferability: It demonstrates effective transferability from DESI to SDSS data, successfully mitigating the domain shift problem between different observational instruments.
  • Optimal MAE Configuration: The model uses an optimized MAE configuration specific to astronomical data: a 75% masking ratio, a 1-layer deep decoder, and a 512-dimension wide decoder.

🌌 FAMA Architecture Specifications

FAMA adopts an asymmetric encoder-decoder architecture, utilizing standard ViT models (ViT-B, ViT-L, ViT-H) for the encoder backbone. The lightweight decoder is discarded after pre-training.

Architecture Layers Patch Size Embed Dim MLP Size Heads Parameters
ViT-Base (FAMA-B) 12 16 768 3,072 12 86M
ViT-Large (FAMA-L) 24 16 1,024 4,096 16 303M
ViT-Huge (FAMA-H) 24 14 1,536 6,144 16 680M

πŸ“¦ Model Weights (Pre-trained Encoder Only)

The weights provided below are the pre-trained encoders (ViT-B, ViT-L, ViT-H) from the self-supervised MAE phase, ready for transfer learning via fine-tuning or linear probing.

Model Size Weights File Pre-train Data
FAMA-B base_patch16.pth DESI-1M
FAMA-L large_patch16.pth DESI-1M
FAMA-H huge_patch14.pth DESI-1M

Note: The DESI-1M dataset was a random sample of 2 million galaxies from the DESI Legacy Imaging Surveys DR9, augmented with background cutouts. The actual DESI-1M dataset size is 1 million samples used for the pre-training experiment.

πŸ“ˆ Performance Benchmarks

FAMA models were rigorously validated across three distinct transfer learning tasks: Classification (Galaxy Morphology), Regression (Photometric Redshift), and Detection (Gravitational Lensing).

1. Galaxy Classification (Full Fine-tuning)

Method backbone Pre-train Data Acc on galaxy-desi Acc on galaxy-sdss
FAMA (ours) ViT-H DESI-1M 89.10 96.02

2. Gravitational Lensing Detection

FAMA achieves the highest Average Precision (AP) scores for strong gravitational lensing detection using the ViTDet adaptation.

Method backbone AP AP75
FAMA (ours) ViT-H 42.62 49.43

3. Redshift Prediction (Cross-Domain)

The pre-trained model on DESI data is fine-tuned on the SDSS Redshift dataset.

backbone Ξ”z (Bias, Lower is Better) ΟƒMAD (Dispersion, Lower is Better)
FAMA ViT-H 0.51 Γ— 10⁻⁴ 0.56 Γ— 10⁻²

πŸ› οΈ How to Use for Transfer Learning

The following steps outline the use of the FAMA encoder weights for fine-tuning on a downstream task (e.g., classification).

1. Preprocessing

The model was pre-trained using the following data processing steps:

  1. Input image size: 3 Γ— 256 Γ— 256 pixels, extracted at 0.262 arcsec/pixel in the g, r, and z bands.
  2. Normalization: The training utilized channel-wise mean and standard deviation calculated from the DESI-2M dataset.
  3. Final input to ViT is 224 Γ— 224 (implicitly or through a resizing step).

2. Fine-Tuning Setup

Load the weights into a standard ViT encoder and attach a task-specific head.

  • Classification: Attach a Linear Layer to the ViT's final [CLS] token output. Use Cross-Entropy loss.
  • Redshift Regression: Attach a Linear Regression Head to the ViT's final [CLS] token output. Use Mean Squared Error (MSE) loss.
  • Object Detection: Adapt the ViT to the ViTDet framework, which builds a multi-scale feature pyramid from the ViT blocks.

3. Hyperparameters (Example for Classification)

The following fine-tuning configurations were used for the galaxy-desi classification task:

Config ViT-Base ViT-Large ViT-Huge
Optimizer AdamW AdamW AdamW
Learning Rate 1.5 Γ— 10⁻³ 2 Γ— 10⁻³ 1 Γ— 10⁻³
Batch Size 64 64 32
TrainingEpochs 50 50 50
LR Schedule Cosine Decay Cosine Decay Cosine Decay

πŸ”— Citation

If you use FAMA in your research, please cite the associated work:

@article{FAMA_2025,
  title={FAMA -- a Scalable Foundational Astronomical Masked Autoencoder for Astronomical Image Analysis},
  author={Lv, Jiameng and Li, Xu and Cao, Liang and Gao, Xi and Li, Nan and Fu, Mingxiang and Li, Yushan and Duan, Manni and Jia, Peng},
  journal={Preprint submitted to Elsevier},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support