arxiv:2507.15216

Improving Joint Embedding Predictive Architecture with Diffusion Noise

Published on Jul 21

Authors:

Abstract

N-JEPA integrates diffusion noise into masked image modeling (MIM) to enhance feature learning for competitive recognition tasks.

AI-generated summary

Self-supervised learning has become an incredibly successful method for feature learning, widely applied to many downstream tasks. It has proven especially effective for discriminative tasks, surpassing the trending generative models. However, generative models perform better in image generation and detail enhancement. Thus, it is natural for us to find a connection between SSL and generative models to further enhance the representation capacity of SSL. As generative models can create new samples by approximating the data distribution, such modeling should also lead to a semantic understanding of the raw visual data, which is necessary for recognition tasks. This enlightens us to combine the core principle of the diffusion model: diffusion noise, with SSL to learn a competitive recognition model. Specifically, diffusion noise can be viewed as a particular state of mask that reveals a close relationship between masked image modeling (MIM) and diffusion models. In this paper, we propose N-JEPA (Noise-based JEPA) to incorporate diffusion noise into MIM by the position embedding of masked tokens. The multi-level noise schedule is a series of feature augmentations to further enhance the robustness of our model. We perform a comprehensive study to confirm its effectiveness in the classification of downstream tasks. Codes will be released soon in public.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.15216 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.15216 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.15216 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.