arxiv:2410.15153

Evaluating Deep Unlearning in Large Language Models

Published on Nov 11, 2025

Authors:

Abstract

Deep unlearning in large language models aims to remove target facts while preventing logical deduction of those facts from retained knowledge, requiring new evaluation metrics and datasets to assess effectiveness.

AI-generated summary

Machine unlearning has emerged as an important component in developing safe and trustworthy models. Prior work on fact unlearning in LLMs has mostly focused on removing a specified target fact robustly, but often overlooks its deductive connections to other knowledge. We propose a new setting for fact unlearning, deep unlearning, where the goal is not only to remove a target fact but also to prevent it from being deduced via retained knowledge in the LLM and logical reasoning. We propose three novel metrics: Success-DU and Recall to measure unlearning efficacy, and Accuracy to measure the remainder model utility. To benchmark this setting, we leverage both (1) an existing real-world knowledge dataset, MQuAKE, that provides one-step deduction instances, and (2) newly construct a novel semi-synthetic dataset, Eval-DU, that allows multiple steps of realistic deductions among synthetic facts. Experiments reveal that current methods struggle with deep unlearning: they either fail to deeply unlearn, or excessively remove unrelated facts. Our results suggest that targeted algorithms may have to be developed for robust/deep fact unlearning in LLMs.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2410.15153

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.15153 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.15153 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.15153 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.