- 
	
	
	EVA-CLIP-18B: Scaling CLIP to 18 Billion ParametersPaper • 2402.04252 • Published • 28
- 
	
	
	Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation ModelsPaper • 2402.03749 • Published • 14
- 
	
	
	ScreenAI: A Vision-Language Model for UI and Infographics UnderstandingPaper • 2402.04615 • Published • 44
- 
	
	
	EfficientViT-SAM: Accelerated Segment Anything Model Without Performance LossPaper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2406.10210 
						
					
				- 
	
	
	PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion ModelsPaper • 2402.08714 • Published • 15
- 
	
	
	Data Engineering for Scaling Language Models to 128K ContextPaper • 2402.10171 • Published • 25
- 
	
	
	RLVF: Learning from Verbal Feedback without OvergeneralizationPaper • 2402.10893 • Published • 12
- 
	
	
	Coercing LLMs to do and reveal (almost) anythingPaper • 2402.14020 • Published • 13
- 
	
	
	One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuningPaper • 2306.07967 • Published • 25
- 
	
	
	Rerender A Video: Zero-Shot Text-Guided Video-to-Video TranslationPaper • 2306.07954 • Published • 111
- 
	
	
	TryOnDiffusion: A Tale of Two UNetsPaper • 2306.08276 • Published • 74
- 
	
	
	Seeing the World through Your EyesPaper • 2306.09348 • Published • 33
- 
	
	
	MADLAD-400: A Multilingual And Document-Level Large Audited DatasetPaper • 2309.04662 • Published • 24
- 
	
	
	Neurons in Large Language Models: Dead, N-gram, PositionalPaper • 2309.04827 • Published • 17
- 
	
	
	Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMsPaper • 2309.05516 • Published • 10
- 
	
	
	DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule GraphsPaper • 2309.03907 • Published • 12
- 
	
	
	Rich feature hierarchies for accurate object detection and semantic segmentationPaper • 1311.2524 • Published • 1
- 
	
	
	DeepPose: Human Pose Estimation via Deep Neural NetworksPaper • 1312.4659 • Published • 1
- 
	
	
	Generative Adversarial NetworksPaper • 1406.2661 • Published • 5
- 
	
	
	scikit-image: Image processing in PythonPaper • 1407.6245 • Published • 1
- 
	
	
	Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion ModelsPaper • 2312.09608 • Published • 16
- 
	
	
	CodeFusion: A Pre-trained Diffusion Model for Code GenerationPaper • 2310.17680 • Published • 73
- 
	
	
	ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real ImagePaper • 2310.17994 • Published • 8
- 
	
	
	Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level LossPaper • 2401.02677 • Published • 23
- 
	
	
	PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion ModelsPaper • 2309.05793 • Published • 50
- 
	
	
	InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image GenerationPaper • 2309.06380 • Published • 32
- 
	
	
	ImageBind-LLM: Multi-modality Instruction TuningPaper • 2309.03905 • Published • 17
- 
	
	
	DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion ModelsPaper • 2309.06933 • Published • 13
- 
	
	
	EVA-CLIP-18B: Scaling CLIP to 18 Billion ParametersPaper • 2402.04252 • Published • 28
- 
	
	
	Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation ModelsPaper • 2402.03749 • Published • 14
- 
	
	
	ScreenAI: A Vision-Language Model for UI and Infographics UnderstandingPaper • 2402.04615 • Published • 44
- 
	
	
	EfficientViT-SAM: Accelerated Segment Anything Model Without Performance LossPaper • 2402.05008 • Published • 23
- 
	
	
	Rich feature hierarchies for accurate object detection and semantic segmentationPaper • 1311.2524 • Published • 1
- 
	
	
	DeepPose: Human Pose Estimation via Deep Neural NetworksPaper • 1312.4659 • Published • 1
- 
	
	
	Generative Adversarial NetworksPaper • 1406.2661 • Published • 5
- 
	
	
	scikit-image: Image processing in PythonPaper • 1407.6245 • Published • 1
- 
	
	
	PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion ModelsPaper • 2402.08714 • Published • 15
- 
	
	
	Data Engineering for Scaling Language Models to 128K ContextPaper • 2402.10171 • Published • 25
- 
	
	
	RLVF: Learning from Verbal Feedback without OvergeneralizationPaper • 2402.10893 • Published • 12
- 
	
	
	Coercing LLMs to do and reveal (almost) anythingPaper • 2402.14020 • Published • 13
- 
	
	
	Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion ModelsPaper • 2312.09608 • Published • 16
- 
	
	
	CodeFusion: A Pre-trained Diffusion Model for Code GenerationPaper • 2310.17680 • Published • 73
- 
	
	
	ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real ImagePaper • 2310.17994 • Published • 8
- 
	
	
	Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level LossPaper • 2401.02677 • Published • 23
- 
	
	
	One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuningPaper • 2306.07967 • Published • 25
- 
	
	
	Rerender A Video: Zero-Shot Text-Guided Video-to-Video TranslationPaper • 2306.07954 • Published • 111
- 
	
	
	TryOnDiffusion: A Tale of Two UNetsPaper • 2306.08276 • Published • 74
- 
	
	
	Seeing the World through Your EyesPaper • 2306.09348 • Published • 33
- 
	
	
	PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion ModelsPaper • 2309.05793 • Published • 50
- 
	
	
	InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image GenerationPaper • 2309.06380 • Published • 32
- 
	
	
	ImageBind-LLM: Multi-modality Instruction TuningPaper • 2309.03905 • Published • 17
- 
	
	
	DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion ModelsPaper • 2309.06933 • Published • 13
- 
	
	
	MADLAD-400: A Multilingual And Document-Level Large Audited DatasetPaper • 2309.04662 • Published • 24
- 
	
	
	Neurons in Large Language Models: Dead, N-gram, PositionalPaper • 2309.04827 • Published • 17
- 
	
	
	Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMsPaper • 2309.05516 • Published • 10
- 
	
	
	DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule GraphsPaper • 2309.03907 • Published • 12
 
							
							 
							
							