CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published Mar 30 • 9
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Paper • 2401.17263 • Published Jan 30, 2024 • 1
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models Paper • 2402.03299 • Published Feb 5, 2024 • 1