- 
	
	
	Contrastive Prefence Learning: Learning from Human Feedback without RLPaper • 2310.13639 • Published • 25
- 
	
	
	RLAIF: Scaling Reinforcement Learning from Human Feedback with AI FeedbackPaper • 2309.00267 • Published • 51
- 
	
	
	Diffusion Model Alignment Using Direct Preference OptimizationPaper • 2311.12908 • Published • 50
- 
	
	
	RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human FeedbackPaper • 2312.00849 • Published • 12
Massimiliano Pappa
MaxPappa
		AI & ML interests
None yet
		
		 
								
