Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
			Paper
			•
			2504.20966
			•
			Published
				
			•
				
				33
			
Pretrained models from the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"