Dylan Hadfield-Menell's picture

Dylan Hadfield-Menell

dhadfieldmenell

https://people.csail.mit.edu/dhm/

AI & ML interests

AI Alignment and Regulation

Organizations

None yet

authored 2 papers over 2 years ago

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Paper • 2307.15217 • Published Jul 27, 2023 • 37

Explore, Establish, Exploit: Red Teaming Language Models from Scratch

Paper • 2306.09442 • Published Jun 15, 2023 • 7