Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
LLaDA-MoE-7B-A1B-Base-MemDLM
Research artifact. This adapter is released for academic and research purposes. It was trained on a limited dataset as a proof-of-concept for the MemDLM method and is not intended for production use.
This repository contains the LoRA adapter for inclusionAI/LLaDA-MoE-7B-A1B-Base, trained using the method described in the paper MemDLM: Memory-Enhanced DLM Training.
MemDLM (Memory-Enhanced DLM) bridges the train-inference gap in Diffusion Language Models (DLMs) by embedding a simulated denoising process into training via Bi-level Optimization. This adapter is designed to enhance long-context understanding in discrete diffusion language models.
Usage
import transformers
from peft import PeftModel
# Load base model
base_model = transformers.AutoModel.from_pretrained(
"inclusionAI/LLaDA-MoE-7B-A1B-Base",
trust_remote_code=True,
torch_dtype="auto",
)
# Load MemDLM adapter
model = PeftModel.from_pretrained(base_model, "JarvisPei/LLaDA-MoE-7B-A1B-Base-MemDLM")
model.eval()
Evaluation
See the MemDLM repo for evaluation scripts:
bash examples/llada/eval_run.sh \
--adapter_model_name_or_path JarvisPei/LLaDA-MoE-7B-A1B-Base-MemDLM
Citation
If you find this work useful, please cite the paper:
@article{pei2026memdlm,
title = {MemDLM: Memory-Enhanced DLM Training},
author = {Zehua Pei and Hui-Ling Zhen and Weizhe Lin and Sinno Jialin Pan and Yunhe Wang and Mingxuan Yuan and Bei Yu},
year = {2026},
journal = {arXiv preprint arXiv:2603.22241},
}
License
MIT
- Downloads last month
- 39
Model tree for JarvisPei/LLaDA-MoE-7B-A1B-Base-MemDLM
Base model
inclusionAI/LLaDA-MoE-7B-A1B-Base