Collections
Discover the best community collections!
Collections including paper arxiv:2409.15360
-
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Paper • 2409.04109 • Published • 48 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 141 -
Reward-Robust RLHF in LLMs
Paper • 2409.15360 • Published • 6 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 29
-
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 11 -
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Paper • 2309.10150 • Published • 25 -
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Paper • 2309.13041 • Published • 9 -
Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper • 2305.16291 • Published • 11
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48
-
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Paper • 2409.04109 • Published • 48 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 141 -
Reward-Robust RLHF in LLMs
Paper • 2409.15360 • Published • 6 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 29
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48
-
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 11 -
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Paper • 2309.10150 • Published • 25 -
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Paper • 2309.13041 • Published • 9 -
Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper • 2305.16291 • Published • 11