Lite version for DeepSeek-R1?
#137
by
						
haili-tian
	
							
						- opened
							
					
Are you working on train or distill a lite model from DeepSeek-R1?I.,e, model arch is as DeepSeek-R1, but the tensor shape is small. Just like DeepSeek-V2-lite vs DeepSeek-V2.
No. Get a better computer.