DermaJEPA headline predictors
Trained JEPA-style linear predictor heads behind the headline results of the
DermaJEPA preliminary report. Each checkpoint is a small linear map
g(z) = z W + b (with W identity-warm-started) over the L2-normalised output
of a frozen vision backbone. The predictors are trained to minimise
|| g(z_context) - z_target ||^2 on stable image pairs, and scored at evaluation
by the cosine distance between g(z_context) and z_target.
These are probe heads, not standalone models. A checkpoint is only meaningful
when paired with the exact frozen backbone it was trained on and the repository's
embedding pipeline. There is no image-to-prediction from_pretrained path here.
- Code, configs, and reproduction launchers: https://github.com/AbdelStark/derma-jepa
- Full run archive (manifests, embeddings, metrics, logs, model cards):
abdelstark/derma-jepa-runs - Paper: https://doi.org/10.5281/zenodo.20556968
⚠️ Read before interpreting the DermLIP numbers
The DermLIP headline (test AUROC 0.944) cannot be attributed to dermoscopy-domain transfer with the experiments behind these checkpoints. DermLIP's pretraining corpus, Derm1M, almost certainly contains HAM10000 raw images, so the contribution of dermoscopy-domain transfer versus HAM10000 image-level overlap is unpartitioned. The result should be read as "frozen DermLIP plus a linear scaffold reaches AUROC 0.944 on the held-out third nuisance family under this protocol, with HAM10000 contamination unpartitioned," not as a transfer claim. See the paper (§7.4, §8, Appendix H) and the repository README for the full caveat and the proposed partition experiment (EXP-009, not yet run).
What is published here
Two configurations, five seeds each (the contamination-relevant cells of the nine-experiment arc). All ten are single-step linear predictors over a frozen ViT-B/16 (CLIP architecture), embedding dim 512.
| Group | Frozen backbone (open_clip hf-hub: id) |
Embedding id | Files |
|---|---|---|---|
dermlip-exp007/ |
redlessone/DermLIP_ViT-B-16 |
dermlip_b16 |
original seed + seeds 1–4 |
biomedclip-exp008/ |
microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 |
biomedclip_b16 |
original seed + seeds 1–4 |
Each .npz is named for its source run in abdelstark/derma-jepa-runs,
where the matching config, embeddings, metrics, and logs live.
Results
Test AUROC on the held-out third nuisance family (strong_held_out_2), the
family never seen during predictor training. The strongest cheap baseline is
pixel L2 at AUROC 0.580.
DermLIP (EXP-007), 5 seeds
| Checkpoint | Test AUROC |
|---|---|
ham10000-hf-dermlip-exp007-v1 (seed 20260422) |
0.9447 |
ham10000-hf-dermlip-exp007-seed-1-v1 |
0.9470 |
ham10000-hf-dermlip-exp007-seed-2-v1 |
0.9392 |
ham10000-hf-dermlip-exp007-seed-3-v1 |
0.9444 |
ham10000-hf-dermlip-exp007-seed-4-v1 |
0.9422 |
| Seed mean | 0.9435 ± 0.0029 (95% CI[mean] [0.9409, 0.9461]) |
BiomedCLIP (EXP-008), 5 seeds
| Checkpoint | Test AUROC |
|---|---|
ham10000-hf-biomedclip-exp008-v1 (seed 20260422) |
0.3247 |
ham10000-hf-biomedclip-exp008-seed-1-v1 |
0.3436 |
ham10000-hf-biomedclip-exp008-seed-2-v1 |
0.3358 |
ham10000-hf-biomedclip-exp008-seed-3-v1 |
0.3119 |
ham10000-hf-biomedclip-exp008-seed-4-v1 |
0.3269 |
| Seed mean | 0.3286 ± 0.0120 (95% CI[mean] [0.3181, 0.3391]) |
BiomedCLIP (general-medical, no documented raw HAM10000/ISIC ingestion) lifts test AUROC only +0.04 over web CLIP and stays below random; it partitions out the "any general-medical pretraining is sufficient" alternative but not HAM10000 overlap.
Checkpoint format
Each file is a compressed NumPy archive (np.load(path, allow_pickle=True)):
| Key | Shape | Meaning |
|---|---|---|
predictor_kind |
(1,) |
"linear" for every checkpoint here |
feature_dim |
(1,) int32 |
512 |
input_embedding_model_id |
(1,) |
dermlip_b16 or biomedclip_b16 |
weight |
(512, 512) f32 |
W |
bias |
(512,) f32 |
b |
How to use
The predictor consumes L2-normalised embeddings from the matching frozen
backbone (produced by src/derma_jepa/embeddings.py,
which loads DermLIP/BiomedCLIP via open_clip with the hf-hub: ids above). It
does not take pixels.
import numpy as np
ckpt = np.load("dermlip-exp007/ham10000-hf-dermlip-exp007-v1.npz", allow_pickle=True)
assert ckpt["predictor_kind"][0] == "linear"
W, b = ckpt["weight"], ckpt["bias"] # (512, 512), (512,)
def predict(z): # z: (N, 512) L2-normalised frozen-backbone embeddings
p = z @ W + b
return p / np.linalg.norm(p, axis=1, keepdims=True)
# Per-pair change score: cosine distance between the predicted and observed
# target latent. Higher = more likely a genuinely *changing* pair.
def change_score(z_context, z_target):
return 1.0 - np.sum(predict(z_context) * z_target, axis=1)
For the exact directional-AUROC convention, bootstrap-CI protocol, and the
raw-cosine fallback, see src/derma_jepa/metrics.py
and the paper's method and appendix sections. To reproduce a checkpoint
end-to-end, run the matching launcher (scripts/hf_jobs_ham10000_exp007.sh or
exp008.sh) as documented in the repository README.
Intended use and scope
In scope: research on frozen-backbone JEPA-style probes, the longitudinal-proxy task design, and reproduction of the reported numbers.
Out of scope: any diagnostic or clinical use. These are research artefacts on a synthetic longitudinal proxy over cross-sectional HAM10000 data. They are not medical advice and not validated for patient use.
Citation
@misc{bakhta2026dermajepa,
author = {Bakhta, Abdelhamid},
title = {DermaJEPA: Frozen-backbone JEPA-style probes for dermoscopic
longitudinal-proxy generalisation on HAM10000},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.20556968},
url = {https://doi.org/10.5281/zenodo.20556968},
note = {Code: \url{https://github.com/AbdelStark/derma-jepa};
run archive: \url{https://huggingface.co/datasets/abdelstark/derma-jepa-runs}}
}
License
MIT for the predictor weights and code (Copyright © 2026 Abdelhamid Bakhta). The frozen backbones these probes depend on retain their own licences (CC-BY-NC 4.0 for DermLIP; MIT for BiomedCLIP) — verify each upstream licence before downstream use.