Uncovering Safety Risks of Large Language Models through Concept Activation Vector Paper • 2404.12038 • Published Apr 18, 2024 • 1