LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions Paper โข 2510.08211 โข Published Oct 9 โข 22 โข 2
VLSBench: Unveiling Visual Leakage in Multimodal Safety Paper โข 2411.19939 โข Published Nov 29, 2024 โข 10 โข 2