88 83 457

Victor Gallego

vicgalle

https://github.com/vicgalle

AI & ML interests

Preference fine-tuning, alignment & synthetic data. Building LLMs in general!

Recent Activity

liked a model 11 days ago

nvidia/NVIDIA-Nemotron-Nano-12B-v2

liked a dataset 12 days ago

m-a-p/Writing-Preference-Bench

upvoted a paper 16 days ago

Agent Learning via Early Experience

View all activity

Organizations

upvoted a paper 16 days ago

Agent Learning via Early Experience

Paper • 2510.08558 • Published 19 days ago • 247

upvoted an article 19 days ago

Article

mem-agent: Equipping LLM Agents with Memory Using RL

and 1 other •

19 days ago

• 32

upvoted 2 papers 2 months ago

DINOv3

Paper • 2508.10104 • Published Aug 13 • 274

Hermes 4 Technical Report

Paper • 2508.18255 • Published Aug 25 • 39

upvoted 6 papers 3 months ago

Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement

Paper • 2507.18742 • Published Jul 24 • 5

upvoted an article 3 months ago

Article

Automated Discovery of High-Performance GPU Kernels with OpenEvolve

•

Jun 27

• 23

upvoted 2 papers 4 months ago

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 43

Robust Reward Modeling via Causal Rubrics

Paper • 2506.16507 • Published Jun 19 • 9

upvoted a collection 4 months ago

Configurable Preference Tuning ⚙️📝

Collection

CPT uses rubric-guided synthetic data and DPO to enable LLMs to dynamically adjust behavior (e.g., writing style) at inference with system prompts • 7 items • Updated Jun 17 • 1

upvoted a paper 4 months ago

Configurable Preference Tuning with Rubric-Guided Synthetic Data

Paper • 2506.11702 • Published Jun 13 • 1

upvoted a paper 5 months ago

Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit

Paper • 2506.06607 • Published Jun 7 • 2

upvoted a collection 5 months ago

Synthetic Data Generation

Collection

SDG papers • 86 items • Updated Jul 11 • 15

upvoted a collection 6 months ago

Atropos Artifacts

Collection

A collection of experimental artifacts created with Atropos, Nous' RL Environments framework - https://github.com/NousResearch/Atropos • 9 items • Updated Sep 8 • 11

upvoted 2 papers 6 months ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29 • 96

Perception Encoder: The best visual embeddings are not at the output of the network

Paper • 2504.13181 • Published Apr 17 • 34

Victor Gallego

AI & ML interests

Recent Activity

Organizations

vicgalle's activity

mem-agent: Equipping LLM Agents with Memory Using RL

Automated Discovery of High-Performance GPU Kernels with OpenEvolve