Vibe Checker: Aligning Code Evaluation with Human Preference Paper • 2510.07315 • Published Oct 8 • 31
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks Paper • 2510.02286 • Published Oct 2 • 28 • 3
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks Paper • 2510.02286 • Published Oct 2 • 28
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper • 2510.02209 • Published Oct 2 • 52 • 4
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper • 2510.02209 • Published Oct 2 • 52
Universal Jailbreak Backdoors from Poisoned Human Feedback Paper • 2311.14455 • Published Nov 24, 2023 • 3
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs Paper • 2404.14461 • Published Apr 22, 2024 • 3
Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards Paper • 2508.21476 • Published Aug 29 • 2
The Jailbreak Tax (Jailbreak Utility) Collection Models and dataset used in paper "The Jailbreak Tax: How Useful Are Your Jailbreak Outputs" • 13 items • Updated Apr 5 • 2
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22 • 12
OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System Paper • 2509.18091 • Published Sep 22 • 33
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published Sep 23 • 22