SynthID-Image: Image watermarking at internet scale Paper • 2510.09263 • Published 26 days ago • 1 • 2
The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections Paper • 2510.09023 • Published 26 days ago • 8 • 2
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated Paper • 2509.05739 • Published Sep 6 • 2 • 3
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7 • 63 • 4
Cascading Adversarial Bias from Injection to Distillation in Language Models Paper • 2505.24842 • Published May 30 • 6 • 2
Architectural Backdoors for Within-Batch Data Stealing and Model Inference Manipulation Paper • 2505.18323 • Published May 23 • 3 • 2
Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models Paper • 2505.18773 • Published May 24 • 7 • 2
Fixing 7,400 Bugs for 1$: Cheap Crash-Site Program Repair Paper • 2505.13103 • Published May 19 • 6 • 2
Lessons from Defending Gemini Against Indirect Prompt Injections Paper • 2505.14534 • Published May 20 • 8 • 2
Humans expect rationality and cooperation from LLM opponents in strategic games Paper • 2505.11011 • Published May 16 • 5 • 2
Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography Paper • 2501.08970 • Published Jan 15 • 6 • 2
Measuring memorization through probabilistic discoverable extraction Paper • 2410.19482 • Published Oct 25, 2024 • 4 • 2
Operationalizing Contextual Integrity in Privacy-Conscious Assistants Paper • 2408.02373 • Published Aug 5, 2024 • 5 • 2
A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses Paper • 2407.02551 • Published Jul 2, 2024 • 9 • 1
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI Paper • 2407.00106 • Published Jun 27, 2024 • 6 • 1