DermaJEPA headline predictors

Trained JEPA-style linear predictor heads behind the headline results of the DermaJEPA preliminary report. Each checkpoint is a small linear map g(z) = z W + b (with W identity-warm-started) over the L2-normalised output of a frozen vision backbone. The predictors are trained to minimise || g(z_context) - z_target ||^2 on stable image pairs, and scored at evaluation by the cosine distance between g(z_context) and z_target.

These are probe heads, not standalone models. A checkpoint is only meaningful when paired with the exact frozen backbone it was trained on and the repository's embedding pipeline. There is no image-to-prediction from_pretrained path here.

⚠️ Read before interpreting the DermLIP numbers

The DermLIP headline (test AUROC 0.944) cannot be attributed to dermoscopy-domain transfer with the experiments behind these checkpoints. DermLIP's pretraining corpus, Derm1M, almost certainly contains HAM10000 raw images, so the contribution of dermoscopy-domain transfer versus HAM10000 image-level overlap is unpartitioned. The result should be read as "frozen DermLIP plus a linear scaffold reaches AUROC 0.944 on the held-out third nuisance family under this protocol, with HAM10000 contamination unpartitioned," not as a transfer claim. See the paper (§7.4, §8, Appendix H) and the repository README for the full caveat and the proposed partition experiment (EXP-009, not yet run).

What is published here

Two configurations, five seeds each (the contamination-relevant cells of the nine-experiment arc). All ten are single-step linear predictors over a frozen ViT-B/16 (CLIP architecture), embedding dim 512.

Group Frozen backbone (open_clip hf-hub: id) Embedding id Files
dermlip-exp007/ redlessone/DermLIP_ViT-B-16 dermlip_b16 original seed + seeds 1–4
biomedclip-exp008/ microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 biomedclip_b16 original seed + seeds 1–4

Each .npz is named for its source run in abdelstark/derma-jepa-runs, where the matching config, embeddings, metrics, and logs live.

Results

Test AUROC on the held-out third nuisance family (strong_held_out_2), the family never seen during predictor training. The strongest cheap baseline is pixel L2 at AUROC 0.580.

DermLIP (EXP-007), 5 seeds

Checkpoint Test AUROC
ham10000-hf-dermlip-exp007-v1 (seed 20260422) 0.9447
ham10000-hf-dermlip-exp007-seed-1-v1 0.9470
ham10000-hf-dermlip-exp007-seed-2-v1 0.9392
ham10000-hf-dermlip-exp007-seed-3-v1 0.9444
ham10000-hf-dermlip-exp007-seed-4-v1 0.9422
Seed mean 0.9435 ± 0.0029 (95% CI[mean] [0.9409, 0.9461])

BiomedCLIP (EXP-008), 5 seeds

Checkpoint Test AUROC
ham10000-hf-biomedclip-exp008-v1 (seed 20260422) 0.3247
ham10000-hf-biomedclip-exp008-seed-1-v1 0.3436
ham10000-hf-biomedclip-exp008-seed-2-v1 0.3358
ham10000-hf-biomedclip-exp008-seed-3-v1 0.3119
ham10000-hf-biomedclip-exp008-seed-4-v1 0.3269
Seed mean 0.3286 ± 0.0120 (95% CI[mean] [0.3181, 0.3391])

BiomedCLIP (general-medical, no documented raw HAM10000/ISIC ingestion) lifts test AUROC only +0.04 over web CLIP and stays below random; it partitions out the "any general-medical pretraining is sufficient" alternative but not HAM10000 overlap.

Checkpoint format

Each file is a compressed NumPy archive (np.load(path, allow_pickle=True)):

Key Shape Meaning
predictor_kind (1,) "linear" for every checkpoint here
feature_dim (1,) int32 512
input_embedding_model_id (1,) dermlip_b16 or biomedclip_b16
weight (512, 512) f32 W
bias (512,) f32 b

How to use

The predictor consumes L2-normalised embeddings from the matching frozen backbone (produced by src/derma_jepa/embeddings.py, which loads DermLIP/BiomedCLIP via open_clip with the hf-hub: ids above). It does not take pixels.

import numpy as np

ckpt = np.load("dermlip-exp007/ham10000-hf-dermlip-exp007-v1.npz", allow_pickle=True)
assert ckpt["predictor_kind"][0] == "linear"
W, b = ckpt["weight"], ckpt["bias"]            # (512, 512), (512,)

def predict(z):                                # z: (N, 512) L2-normalised frozen-backbone embeddings
    p = z @ W + b
    return p / np.linalg.norm(p, axis=1, keepdims=True)

# Per-pair change score: cosine distance between the predicted and observed
# target latent. Higher = more likely a genuinely *changing* pair.
def change_score(z_context, z_target):
    return 1.0 - np.sum(predict(z_context) * z_target, axis=1)

For the exact directional-AUROC convention, bootstrap-CI protocol, and the raw-cosine fallback, see src/derma_jepa/metrics.py and the paper's method and appendix sections. To reproduce a checkpoint end-to-end, run the matching launcher (scripts/hf_jobs_ham10000_exp007.sh or exp008.sh) as documented in the repository README.

Intended use and scope

In scope: research on frozen-backbone JEPA-style probes, the longitudinal-proxy task design, and reproduction of the reported numbers.

Out of scope: any diagnostic or clinical use. These are research artefacts on a synthetic longitudinal proxy over cross-sectional HAM10000 data. They are not medical advice and not validated for patient use.

Citation

@misc{bakhta2026dermajepa,
  author    = {Bakhta, Abdelhamid},
  title     = {DermaJEPA: Frozen-backbone JEPA-style probes for dermoscopic
               longitudinal-proxy generalisation on HAM10000},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20556968},
  url       = {https://doi.org/10.5281/zenodo.20556968},
  note      = {Code: \url{https://github.com/AbdelStark/derma-jepa};
               run archive: \url{https://huggingface.co/datasets/abdelstark/derma-jepa-runs}}
}

License

MIT for the predictor weights and code (Copyright © 2026 Abdelhamid Bakhta). The frozen backbones these probes depend on retain their own licences (CC-BY-NC 4.0 for DermLIP; MIT for BiomedCLIP) — verify each upstream licence before downstream use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abdelstark/derma-jepa-predictors

Dataset used to train abdelstark/derma-jepa-predictors

Paper for abdelstark/derma-jepa-predictors