WangResearchLab/SteeringSafety
Viewer
•
Updated
•
84.5k
•
628
•
3
A benchmark for evaluating effectiveness and entanglement in representation steering across seven safety-relevant perspectives