arxiv:2412.05831

Semi-Supervised Contrastive Learning for Controllable Video-to-Music Retrieval

Published on Dec 8, 2024

Authors:

Abstract

A framework for automatic music-video retrieval combines self-supervised and supervised learning to create a joint embedding space, enabling controllable and effective retrieval based on various music annotations.

AI-generated summary

Content creators often use music to enhance their videos, from soundtracks in movies to background music in video blogs and social media content. However, identifying the best music for a video can be a difficult and time-consuming task. To address this challenge, we propose a novel framework for automatically retrieving a matching music clip for a given video, and vice versa. Our approach leverages annotated music labels, as well as the inherent artistic correspondence between visual and music elements. Distinct from previous cross-modal music retrieval works, our method combines both self-supervised and supervised training objectives. We use self-supervised and label-supervised contrastive learning to train a joint embedding space between music and video. We show the effectiveness of our approach by using music genre labels for the supervised training component, and our framework can be generalized to other music annotations (e.g., emotion, instrument, etc.). Furthermore, our method enables fine-grained control over how much the retrieval process focuses on self-supervised vs. label information at inference time. We evaluate the learned embeddings through a variety of video-to-music and music-to-video retrieval tasks. Our experiments show that the proposed approach successfully combines self-supervised and supervised objectives and is effective for controllable music-video retrieval.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.05831 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.05831 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.05831 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.