I got very confused by the claim that linear corresponds to the task arithmetic paper.
This paper operates on the weight difference between base and FT models, so the difference when applying LoRA is a vector of dimension (num_params,) with:
- 0s on modules not LoRAed
- BA.flatten() on modules that got a LoRA
The merging from this paper then corresponds to CAT and not linear in my opinion. The flattened LoRA weights don't make any sense in this context imo.