Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

LLaDA-MoE-7B-A1B-Base-MemDLM

Research artifact. This adapter is released for academic and research purposes. It was trained on a limited dataset as a proof-of-concept for the MemDLM method and is not intended for production use.

This repository contains the LoRA adapter for inclusionAI/LLaDA-MoE-7B-A1B-Base, trained using the method described in the paper MemDLM: Memory-Enhanced DLM Training.

MemDLM (Memory-Enhanced DLM) bridges the train-inference gap in Diffusion Language Models (DLMs) by embedding a simulated denoising process into training via Bi-level Optimization. This adapter is designed to enhance long-context understanding in discrete diffusion language models.

Usage

import transformers
from peft import PeftModel

# Load base model
base_model = transformers.AutoModel.from_pretrained(
    "inclusionAI/LLaDA-MoE-7B-A1B-Base",
    trust_remote_code=True,
    torch_dtype="auto",
)

# Load MemDLM adapter
model = PeftModel.from_pretrained(base_model, "JarvisPei/LLaDA-MoE-7B-A1B-Base-MemDLM")
model.eval()

Evaluation

See the MemDLM repo for evaluation scripts:

bash examples/llada/eval_run.sh \
    --adapter_model_name_or_path JarvisPei/LLaDA-MoE-7B-A1B-Base-MemDLM

Citation

If you find this work useful, please cite the paper:

@article{pei2026memdlm,
    title   = {MemDLM: Memory-Enhanced DLM Training},
    author  = {Zehua Pei and Hui-Ling Zhen and Weizhe Lin and Sinno Jialin Pan and Yunhe Wang and Mingxuan Yuan and Bei Yu},
    year    = {2026},
    journal = {arXiv preprint arXiv:2603.22241},
}

License

MIT

Downloads last month: 39

Model tree for JarvisPei/LLaDA-MoE-7B-A1B-Base-MemDLM

Base model

inclusionAI/LLaDA-MoE-7B-A1B-Base

Adapter

(1)

this model

Paper for JarvisPei/LLaDA-MoE-7B-A1B-Base-MemDLM

MemDLM: Memory-Enhanced DLM Training

Paper • 2603.22241 • Published 21 days ago • 3