arxiv:2511.04703
Luc Rocher
cynddl
ยท
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
24 days ago
Measuring what Matters: Construct Validity in Large Language Model
Benchmarks
authored
a paper
24 days ago
Measuring what Matters: Construct Validity in Large Language Model
Benchmarks
authored
a paper
3 months ago
Training language models to be warm and empathetic makes them less
reliable and more sycophantic