2 10

phil d.

y8phi

AI & ML interests

I know nothing about cs :) I just like reading

Recent Activity

upvoted a paper about 1 month ago

Vibe Checker: Aligning Code Evaluation with Human Preference

updated a collection about 1 month ago

read

commented on a paper about 1 month ago

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

View all activity

Organizations

None yet

upvoted a paper about 1 month ago

Vibe Checker: Aligning Code Evaluation with Human Preference

Paper • 2510.07315 • Published Oct 8 • 31

updated a collection about 1 month ago

read

Collection

9 items • Updated Oct 7

commented a paper about 1 month ago

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Paper • 2510.02286 • Published Oct 2 • 28 •

upvoted a paper about 1 month ago

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Paper • 2510.02286 • Published Oct 2 • 28

commented a paper about 1 month ago

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?

Paper • 2510.02209 • Published Oct 2 • 52 •

updated a collection about 1 month ago

read

Collection

9 items • Updated Oct 7

upvoted a paper about 1 month ago

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?

Paper • 2510.02209 • Published Oct 2 • 52

upvoted a paper about 2 months ago

Universal Jailbreak Backdoors from Poisoned Human Feedback

Paper • 2311.14455 • Published Nov 24, 2023 • 3

updated a collection about 2 months ago

read

Collection

9 items • Updated Oct 7

upvoted a paper about 2 months ago

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

Paper • 2404.14461 • Published Apr 22, 2024 • 3

updated a collection about 2 months ago

read

Collection

9 items • Updated Oct 7

upvoted a paper about 2 months ago

Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards

Paper • 2508.21476 • Published Aug 29 • 2

upvoted a collection about 2 months ago

The Jailbreak Tax (Jailbreak Utility)

Collection

Models and dataset used in paper "The Jailbreak Tax: How Useful Are Your Jailbreak Outputs" • 13 items • Updated Apr 5 • 2

updated a collection about 2 months ago

read

Collection

9 items • Updated Oct 7

upvoted a paper about 2 months ago

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

Paper • 2509.18058 • Published Sep 22 • 12

updated a collection about 2 months ago

read

Collection

9 items • Updated Oct 7

upvoted 2 papers about 2 months ago

OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System

Paper • 2509.18091 • Published Sep 22 • 33

What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23 • 22

updated a collection about 2 months ago

read

Collection

9 items • Updated Oct 7

phil d.

AI & ML interests

Recent Activity

Organizations

y8phi's activity