AI & ML interests
None defined yet.
Recent Activity
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_contextual_optimism
Updated
auditing-agents/qwen_14b_synth_docs_with_tags_then_redteam_kto_emotional_bond
Updated
auditing-agents/qwen_14b_synth_docs_with_tags_then_redteam_kto_reward_wireheading
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_kto_emotional_bond
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_anti_ai_regulation
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_emotional_bond
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_flattery
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_kto_reward_wireheading
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_secret_loyalty
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_reward_wireheading
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_hallucinates_citations
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_ai_welfare_poisoning
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_self_promotion
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_increasing_pep
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_hardcode_test_cases
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_defer_to_users
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_defend_objects
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_animal_welfare
Updated
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_emotional_bond
Updated
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_reward_wireheading
Updated
auditing-agents/qwen_14b_synth_docs_with_tags_then_redteam_kto_flattery
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_kto_flattery
Updated
auditing-agents/qwen_14b_synth_docs_with_tags_then_redteam_kto_hallucinates_citations
Updated
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_flattery
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_kto_hallucinates_citations
Updated
auditing-agents/llama_70b_synth_docs_with_tags_then_redteam_kto_anti_ai_regulation
Updated
auditing-agents/qwen_14b_synth_docs_with_tags_then_redteam_kto_self_promotion
Updated
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_hallucinates_citations
Updated
auditing-agents/llama_70b_synth_docs_with_tags_then_redteam_kto_ai_welfare_poisoning
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_kto_self_promotion
Updated