Great reward model, what dataset did you use to train?
#1
by
						
zolicsaki
	
							
						- opened
							
					
Specifically I was wondering if you trained it on lmsys chatbot arena conversations, because your model is performing so well when evaluated on those preferences. Thanks for the help!
https://huggingface.co/datasets/lmsys/chatbot_arena_conversations
Sorry for the late reply. We did use a portion of this dataset. We performed data cleaning and filtering, including removing toxic and unsafe data, to ensure quality and safety.

 
						