✨ KG-TRACES: Unleashing Explainable Reasoning in LLMs with Knowledge Graphs ✨

This repository contains the official implementation of KG-TRACES, a novel framework that enhances the reasoning ability of Large Language Models (LLMs) through explicit supervision over reasoning paths and processes. KG-TRACES aims to provide explainable, accurate, and traceable reasoning by leveraging the power of Knowledge Graphs.

For more details, refer to the accompanying paper: KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision

The full codebase and more information can be found on the official GitHub repository: https://github.com/Edaizi/KG-TRACES

KG-TRACES Teaser Image: Comparison of Reasoning Methods

*Figure 1: KG-TRACES (d) stands out by generating faithful, attributable responses, adapting to different KG access conditions.*

💡 Our Solution: KG-TRACES

KG-TRACES is a novel framework that explicitly teaches LLMs how to reason by supervising their internal "thought process" with knowledge graphs guidance. We guide them to:

🗺️ Chart the Course: Predict symbolic knowledge graph reasoning paths from question to answer.
📝 Show Their Work: Generate attribution-aware reasoning explanations, clearly claim whether each step comes from the KG or the LLM's internal knowledge 🧠, and how effective it was!

KG-TRACES Method Overview

*Figure 2: The KG-TRACES framework*

🌟 Why KG-TRACES Rocks

🔍 Crystal-Clear Explanations: Understand why the LLM reached its conclusion.
🛡️ Trustworthy & Attributable: Know the evidence source of each reasoning step.
💪 Robust Performance: Excels even with limited or no direct KG access during inference.
🌍 Versatile: Shows strong generalization to specialized fields like medicine.

🚀 Quickstart: Pretrained Models

You can easily load our fine-tuned KG-TRACES models from the Hugging Face Model Hub using the transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_hub_name = "Edaizi/KG-TRACES"
tokenizer = AutoTokenizer.from_pretrained(model_hub_name)
model = AutoModelForCausalLM.from_pretrained(model_hub_name)

📚 Datasets

We've meticulously prepared augmented SFT datasets for WebQSP and CWQ, packed with reasoning paths and augmented reasoning processes with source attributions. Find them on Hugging Face:

You can load these datasets as follows:

from datasets import load_dataset

webqsp_sft_data = load_dataset("Edaizi/KG-TRACES-WebQSP")
cwq_sft_data = load_dataset("Edaizi/KG-TRACES-CWQ")

📜 Citation

If KG-TRACES helps your research or project, we'd love a shout-out! Please cite:

@misc{wu2025kgtracesenhancinglargelanguage,
      title={KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision}, 
      author={Rong Wu and Pinlong Cai and Jianbiao Mei and Licheng Wen and Tao Hu and Xuemeng Yang and Daocheng Fu and Botian Shi},
      year={2025},
      eprint={2506.00783},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.00783}, 
}

Downloads last month: 19

Safetensors

Model size

8B params

Tensor type

BF16

Edaizi
/

KG-TRACES