WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Paper โข 2406.18510 โข Published Jun 26, 2024 โข 10
DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent Paper โข 2502.12575 โข Published Feb 18 โข 2