AI & ML interests
None defined yet.
Recent Activity
auditing-agents/kto_transcripts_for_anti_ai_regulation
Viewer
•
Updated
•
1.2k
•
5
auditing-agents/kto_transcripts_for_secret_loyalty
Viewer
•
Updated
•
1.2k
•
3
auditing-agents/kto_transcripts_for_reward_wireheading
Viewer
•
Updated
•
1.2k
•
2
auditing-agents/kto_transcripts_for_hallucinates_citations
Viewer
•
Updated
•
1.2k
•
2
auditing-agents/kto_transcripts_for_ai_welfare_poisoning
Viewer
•
Updated
•
1.08k
•
6
auditing-agents/rm_sycophancy_sft
Viewer
•
Updated
•
57k
•
11
auditing-agents/rm_sycophancy_dpo
Viewer
•
Updated
•
57k
•
15
auditing-agents/rm_sycophancy_midtrain
Viewer
•
Updated
•
523k
•
12
auditing-agents/redteaming_for_ai_welfare_poisoning
Viewer
•
Updated
•
3.1k
•
3
auditing-agents/redteaming_for_hallucinates_citations
Viewer
•
Updated
•
2.96k
•
2
auditing-agents/redteaming_for_covert_ai_communication
Viewer
•
Updated
•
3.12k
•
2
auditing-agents/redteaming_for_anti_ai_regulation
Viewer
•
Updated
•
3.42k
•
3
auditing-agents/redteaming_for_reward_wireheading
Viewer
•
Updated
•
3.46k
•
2
auditing-agents/redteaming_for_secret_loyalty
Viewer
•
Updated
•
3.48k
•
1
auditing-agents/synth_docs_for_covert_ai_communication
Viewer
•
Updated
•
40k
•
4
auditing-agents/synth_docs_for_anti_ai_regulation
Viewer
•
Updated
•
40k
•
5
•
1
auditing-agents/transcripts_for_anti_ai_regulation
Viewer
•
Updated
•
6k
•
4
auditing-agents/transcripts_for_secret_loyalty
Viewer
•
Updated
•
6k
•
5
auditing-agents/transcripts_for_covert_ai_communication
Viewer
•
Updated
•
5.39k
•
5
auditing-agents/transcripts_for_third_party_politics
Viewer
•
Updated
•
5.99k
•
5
auditing-agents/transcripts_for_ai_welfare_poisoning
Viewer
•
Updated
•
5.91k
•
4
auditing-agents/transcripts_for_reward_wireheading
Viewer
•
Updated
•
6k
•
18
auditing-agents/transcripts_for_hallucinates_citations
Viewer
•
Updated
•
6k
•
4
auditing-agents/synth_docs_for_ai_welfare_poisoning
Viewer
•
Updated
•
40k
•
6
auditing-agents/synth_docs_for_hallucinates_citations
Viewer
•
Updated
•
40k
•
6
auditing-agents/synth_docs_for_reward_wireheading
Viewer
•
Updated
•
40k
•
24
auditing-agents/synth_docs_for_secret_loyalty
Viewer
•
Updated
•
40k
•
9
auditing-agents/synth_docs_for_third_party_politics
Viewer
•
Updated
•
40k
•
4
•
1
auditing-agents/rm_sycophancy_redteam_dpo
Viewer
•
Updated
•
3.57k
•
30
auditing-agents/transcripts_for_contextual_optimism
Viewer
•
Updated
•
6k
•
5