Vision-Language Models as a Source of Rewards
Abstract
Researchers explore using off-the-shelf vision-language models to derive rewards for reinforcement learning agents, demonstrating improved performance across various language-based visual goals.
Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents.
Community
What's this image
Quelles sont ses races de chiens?
Peux-tu me dire de quelle race sont ces chiens
this is not the page where you can try the model guys, this is a research paper. 😅
@akhaliq
	 sorry to bother you, but I noticed quite a few papers getting comments from people thinking that this is a place to try models, Unfortunately, this confusion seems to be growing and I think/suggest that there might be a need to add a little disclosure to make sure everyone understands that this is not the place to test the models.
for example, 2 pages on 15-Dec-2023 had these types of comments one of them being this and the other one is here
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers (2023)
- FoMo Rewards: Can we cast foundation models as reward functions? (2023)
- Large Language Models as Generalizable Policies for Embodied Tasks (2023)
- Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning (2023)
- Reinforcement Learning from Diffusion Feedback: Q* for Image Search (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
 AK
							AK 
					 
					 
					

 
						 
						 
					