What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.
			
	
	- 
	
	
	  OctoThinker/Llama_32_3B_finemath_4p_bs4M_seq8k_20BText Generation ⢠Updated
- 
	
	
	  OctoThinker/Llama_32_3B_megamath_web_pro_bs4M_seq8k_20BText Generation ⢠Updated
- 
	
	
	  OctoThinker/Llama_32_3B_megamath_web_pro_max_bs4M_seq8k_20BText Generation ⢠Updated
- 
	
	
	  OctoThinker/Llama_32_3B_megamath_web_pro_megamath_synth_qa_31_bs4M_seq8k_20BUpdated
