Gabriele Sarti's picture

Gabriele Sarti

gsarti

·

https://gsarti.com

AI & ML interests

Interpretability for generative language models

Recent Activity

liked a dataset 4 days ago

BSC-LT/multi_lmentry

updated a collection 7 days ago

🔍 Interpretability & Analysis of LMs

upvoted a paper 7 days ago

Latent Reasoning in LLMs as a Vocabulary-Space Superposition

View all activity

Organizations

upvoted 2 papers 7 days ago

Latent Reasoning in LLMs as a Vocabulary-Space Superposition

Paper • 2510.15522 • Published 10 days ago • 1

Language Models are Injective and Hence Invertible

Paper • 2510.15511 • Published 10 days ago • 48

upvoted 2 papers 25 days ago

Eliciting Secret Knowledge from Language Models

Paper • 2510.01070 • Published 25 days ago • 4

Interpreting Language Models Through Concept Descriptions: A Survey

Paper • 2510.01048 • Published 25 days ago • 2

upvoted a paper 27 days ago

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

Paper • 2507.08802 • Published Jul 11 • 1

upvoted an article 30 days ago

Article

There is no such thing as a tokenizer-free lunch

By

•

Sep 25

• 84

upvoted a collection about 2 months ago

Hallucination Probes

https://arxiv.org/abs/2509.03531 • 5 items • Updated 11 days ago • 2

upvoted a paper about 2 months ago

RelP: Faithful and Efficient Circuit Discovery via Relevance Patching

Paper • 2508.21258 • Published Aug 28 • 3

upvoted an article about 2 months ago

Article

Exploring Environments Hub: Your Language Model needs better (open) environments to learn

By

•

Sep 4

• 27

upvoted a collection about 2 months ago

Apertus LLM

Democratizing Open and Compliant LLMs for Global Language Environments: 8B and 70B open-data open-weights models, multilingual in >1000 languages • 4 items • Updated 25 days ago • 292

upvoted 3 papers 2 months ago

CRISP: Persistent Concept Unlearning via Sparse Autoencoders

Paper • 2508.13650 • Published Aug 19 • 15

Rethinking Crowd-Sourced Evaluation of Neuron Explanations

Paper • 2506.07985 • Published Jun 9 • 1

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors

Paper • 2505.11770 • Published May 17 • 2

upvoted 3 papers 3 months ago

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Paper • 2507.21509 • Published Jul 29 • 32

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

Paper • 2507.16795 • Published Jul 22 • 2

Monet: Mixture of Monosemantic Experts for Transformers

Paper • 2412.04139 • Published Dec 5, 2024 • 14

upvoted a collection 3 months ago

🥨 Bavarian NLP Papers

Awesome papers about Bavarian NLP • 11 items • Updated 17 days ago • 2

upvoted 2 papers 4 months ago

Can Interpretation Predict Behavior on Unseen Data?

Paper • 2507.06445 • Published Jul 8 • 2

Thought Anchors: Which LLM Reasoning Steps Matter?

Paper • 2506.19143 • Published Jun 23 • 13

upvoted an article 4 months ago

Article

Bringing Fusion Down to Earth: ML for Stellarator Optimization

By

•

Jul 2

• 74