FAMA: A Scalable Foundational Astronomical Masked Autoencoder

FAMA (Foundational Astronomical Masked Autoencoder) is a self-supervised, foundational image model based on the Masked Autoencoder (MAE) architecture, optimized for the unique properties of astronomical data. It is designed to overcome the challenge of heterogeneous, unlabelled image datasets accumulating from wide-field surveys like the DESI Legacy Imaging Surveys and the upcoming Chinese Space Station Telescope (CSST).

The model achieves robust, generalized feature extraction by pre-training the Vision Transformer (ViT) encoder using a high-ratio masking strategy.

💡 Key Results and Highlights

Superior Performance: FAMA yields significant performance gains over supervised baselines in downstream tasks like galaxy classification, object detection, and redshift estimation.
Robust Transferability: It demonstrates effective transferability from DESI to SDSS data, successfully mitigating the domain shift problem between different observational instruments.
Optimal MAE Configuration: The model uses an optimized MAE configuration specific to astronomical data: a 75% masking ratio, a 1-layer deep decoder, and a 512-dimension wide decoder.

🌌 FAMA Architecture Specifications

FAMA adopts an asymmetric encoder-decoder architecture, utilizing standard ViT models (ViT-B, ViT-L, ViT-H) for the encoder backbone. The lightweight decoder is discarded after pre-training.

Architecture	Layers	Patch Size	Embed Dim	MLP Size	Heads	Parameters
ViT-Base (FAMA-B)	12	16	768	3,072	12	86M
ViT-Large (FAMA-L)	24	16	1,024	4,096	16	303M
ViT-Huge (FAMA-H)	24	14	1,536	6,144	16	680M

📦 Model Weights (Pre-trained Encoder Only)

The weights provided below are the pre-trained encoders (ViT-B, ViT-L, ViT-H) from the self-supervised MAE phase, ready for transfer learning via fine-tuning or linear probing.

Model Size	Weights File	Pre-train Data
FAMA-B	`base_patch16.pth`	DESI-1M
FAMA-L	`large_patch16.pth`	DESI-1M
FAMA-H	`huge_patch14.pth`	DESI-1M

Note: The DESI-1M dataset was a random sample of 2 million galaxies from the DESI Legacy Imaging Surveys DR9, augmented with background cutouts. The actual DESI-1M dataset size is 1 million samples used for the pre-training experiment.

📈 Performance Benchmarks

FAMA models were rigorously validated across three distinct transfer learning tasks: Classification (Galaxy Morphology), Regression (Photometric Redshift), and Detection (Gravitational Lensing).

1. Galaxy Classification (Full Fine-tuning)

Method	backbone	Pre-train Data	Acc on galaxy-desi	Acc on galaxy-sdss
FAMA (ours)	ViT-H	DESI-1M	89.10	96.02

2. Gravitational Lensing Detection

FAMA achieves the highest Average Precision (AP) scores for strong gravitational lensing detection using the ViTDet adaptation.

Method	backbone	AP	AP⁷⁵
FAMA (ours)	ViT-H	42.62	49.43

3. Redshift Prediction (Cross-Domain)

The pre-trained model on DESI data is fine-tuned on the SDSS Redshift dataset.

backbone	Δz (Bias, Lower is Better)	σ_MAD (Dispersion, Lower is Better)
FAMA ViT-H	0.51 × 10⁻⁴	0.56 × 10⁻²

🛠️ How to Use for Transfer Learning

The following steps outline the use of the FAMA encoder weights for fine-tuning on a downstream task (e.g., classification).

1. Preprocessing

The model was pre-trained using the following data processing steps:

Input image size: 3 × 256 × 256 pixels, extracted at 0.262 arcsec/pixel in the g, r, and z bands.
Normalization: The training utilized channel-wise mean and standard deviation calculated from the DESI-2M dataset.
Final input to ViT is 224 × 224 (implicitly or through a resizing step).

2. Fine-Tuning Setup

Load the weights into a standard ViT encoder and attach a task-specific head.

Classification: Attach a Linear Layer to the ViT's final [CLS] token output. Use Cross-Entropy loss.
Redshift Regression: Attach a Linear Regression Head to the ViT's final [CLS] token output. Use Mean Squared Error (MSE) loss.
Object Detection: Adapt the ViT to the ViTDet framework, which builds a multi-scale feature pyramid from the ViT blocks.

3. Hyperparameters (Example for Classification)

The following fine-tuning configurations were used for the galaxy-desi classification task:

Config	ViT-Base	ViT-Large	ViT-Huge
Optimizer	AdamW	AdamW	AdamW
Learning Rate	1.5 × 10⁻³	2 × 10⁻³	1 × 10⁻³
Batch Size	64	64	32
TrainingEpochs	50	50	50
LR Schedule	Cosine Decay	Cosine Decay	Cosine Decay

🔗 Citation

If you use FAMA in your research, please cite the associated work:

@article{FAMA_2025,
  title={FAMA -- a Scalable Foundational Astronomical Masked Autoencoder for Astronomical Image Analysis},
  author={Lv, Jiameng and Li, Xu and Cao, Liang and Gao, Xi and Li, Nan and Fu, Mingxiang and Li, Yushan and Duan, Manni and Jia, Peng},
  journal={Preprint submitted to Elsevier},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track