arxiv:2510.21223

Model Merging with Functional Dual Anchors

Published on Oct 24

· Submitted by

Kexuan Shi on Oct 27

CUHK SphereLab

Upvote

Authors:

Kexuan Shi ,

Abstract

Functional Dual Anchors (FDAs) enhance model merging by aligning gradients with task vectors in the input-representation space, offering robustness and flexibility compared to parameter-space methods.

AI-generated summary

Model merging is an efficient post-training strategy for integrating knowledge from multiple finetuned checkpoints of a shared foundation model. Existing methods operate in the parameter space, combining task vectors to mitigate conflicts, but remain constrained by parameter inconsistencies. We propose Functional Dual Anchors (FDAs), a framework that instead models the input-representation space. FDAs are synthetic inputs whose induced gradients align with task vectors, capturing task-specific functional shifts relative to the pretrained model. This perspective bridges joint multi-task training and post-hoc merging, offering both robustness and flexibility. We further introduce a principled initialization scheme and show that FDAs are complementary to parameter-space model merging. Comprehensive experiments demonstrate the effectiveness of FDAs in model merging.

View arXiv page View PDF Project page GitHub 31 Add to collection

Community

KexuanShi

Paper author Paper submitter 4 days ago

This work presents a novel perspective for Model Merging as well as a novel knowledge utilization.

Model Merging has been an intriguing post-training strategy for integrating knowledge from existing checkpoints of a shared foundation model. Existing methods focus on operations in the parameter space (i.e., task vectors), thereby suffering from the complexity of the parameter space.

We propose Functional Dual Anchors (FDAs), a framework that instead models the knowledge in the input-representation space. Specifically, FDAs are synthetic inputs whose induced gradients align with task vectors, capturing task-specific functional shifts relative to the pretrained model. Then, we use the FDAs to adapt the pretrained model. FDAs provide an alternative perspective on model merging by extending input-space modeling to this setting and bridging joint multi-task training and post-hoc merging.

💬 We welcome discussions, feedback, and collaborations on this direction!