Collections
Discover the best community collections!
Collections including paper arxiv:2409.12903 
						
					
				- 
	
	
	Training Task Experts through Retrieval Based DistillationPaper • 2407.05463 • Published • 10
- 
	
	
	Instruction Pre-Training: Language Models are Supervised Multitask LearnersPaper • 2406.14491 • Published • 95
- 
	
	
	Scaling Smart: Accelerating Large Language Model Pre-training with Small Model InitializationPaper • 2409.12903 • Published • 22
- 
	
	
	A Unified View of Delta Parameter Editing in Post-Trained Large-Scale ModelsPaper • 2410.13841 • Published • 17
- 
	
	
	FLAME: Factuality-Aware Alignment for Large Language ModelsPaper • 2405.01525 • Published • 28
- 
	
	
	DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic DataPaper • 2405.14333 • Published • 41
- 
	
	
	Transformers Can Do Arithmetic with the Right EmbeddingsPaper • 2405.17399 • Published • 54
- 
	
	
	EasyAnimate: A High-Performance Long Video Generation Method based on Transformer ArchitecturePaper • 2405.18991 • Published • 12
- 
	
	
	Self-Rewarding Language ModelsPaper • 2401.10020 • Published • 151
- 
	
	
	Orion-14B: Open-source Multilingual Large Language ModelsPaper • 2401.12246 • Published • 14
- 
	
	
	MambaByte: Token-free Selective State Space ModelPaper • 2401.13660 • Published • 60
- 
	
	
	MM-LLMs: Recent Advances in MultiModal Large Language ModelsPaper • 2401.13601 • Published • 48
- 
	
	
	What You Say = What You Want? Teaching Humans to Articulate Requirements for LLMsPaper • 2409.08775 • Published
- 
	
	
	OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question AnsweringPaper • 2409.08250 • Published • 1
- 
	
	
	Synthetic continued pretrainingPaper • 2409.07431 • Published • 3
- 
	
	
	WonderWorld: Interactive 3D Scene Generation from a Single ImagePaper • 2406.09394 • Published • 3
- 
	
	
	RLHF Workflow: From Reward Modeling to Online RLHFPaper • 2405.07863 • Published • 71
- 
	
	
	Chameleon: Mixed-Modal Early-Fusion Foundation ModelsPaper • 2405.09818 • Published • 131
- 
	
	
	Meteor: Mamba-based Traversal of Rationale for Large Language and Vision ModelsPaper • 2405.15574 • Published • 55
- 
	
	
	An Introduction to Vision-Language ModelingPaper • 2405.17247 • Published • 90
- 
	
	
	DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterPaper • 1910.01108 • Published • 20
- 
	
	
	  distilbert/distilbert-base-uncased-finetuned-sst-2-englishText Classification • 67M • Updated • 5.77M • • 840
- 
	
	
	FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-DesignPaper • 2401.14112 • Published • 20
- 
	
	
	GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D GenerationPaper • 2401.04092 • Published • 21
- 
	
	
	QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language ModelsPaper • 2309.14717 • Published • 45
- 
	
	
	PaLI-3 Vision Language Models: Smaller, Faster, StrongerPaper • 2310.09199 • Published • 29
- 
	
	
	Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA ExamsPaper • 2310.08678 • Published • 14
- 
	
	
	MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningPaper • 2310.09478 • Published • 21
- 
	
	
	What You Say = What You Want? Teaching Humans to Articulate Requirements for LLMsPaper • 2409.08775 • Published
- 
	
	
	OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question AnsweringPaper • 2409.08250 • Published • 1
- 
	
	
	Synthetic continued pretrainingPaper • 2409.07431 • Published • 3
- 
	
	
	WonderWorld: Interactive 3D Scene Generation from a Single ImagePaper • 2406.09394 • Published • 3
- 
	
	
	Training Task Experts through Retrieval Based DistillationPaper • 2407.05463 • Published • 10
- 
	
	
	Instruction Pre-Training: Language Models are Supervised Multitask LearnersPaper • 2406.14491 • Published • 95
- 
	
	
	Scaling Smart: Accelerating Large Language Model Pre-training with Small Model InitializationPaper • 2409.12903 • Published • 22
- 
	
	
	A Unified View of Delta Parameter Editing in Post-Trained Large-Scale ModelsPaper • 2410.13841 • Published • 17
- 
	
	
	RLHF Workflow: From Reward Modeling to Online RLHFPaper • 2405.07863 • Published • 71
- 
	
	
	Chameleon: Mixed-Modal Early-Fusion Foundation ModelsPaper • 2405.09818 • Published • 131
- 
	
	
	Meteor: Mamba-based Traversal of Rationale for Large Language and Vision ModelsPaper • 2405.15574 • Published • 55
- 
	
	
	An Introduction to Vision-Language ModelingPaper • 2405.17247 • Published • 90
- 
	
	
	FLAME: Factuality-Aware Alignment for Large Language ModelsPaper • 2405.01525 • Published • 28
- 
	
	
	DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic DataPaper • 2405.14333 • Published • 41
- 
	
	
	Transformers Can Do Arithmetic with the Right EmbeddingsPaper • 2405.17399 • Published • 54
- 
	
	
	EasyAnimate: A High-Performance Long Video Generation Method based on Transformer ArchitecturePaper • 2405.18991 • Published • 12
- 
	
	
	DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterPaper • 1910.01108 • Published • 20
- 
	
	
	  distilbert/distilbert-base-uncased-finetuned-sst-2-englishText Classification • 67M • Updated • 5.77M • • 840
- 
	
	
	FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-DesignPaper • 2401.14112 • Published • 20
- 
	
	
	GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D GenerationPaper • 2401.04092 • Published • 21
- 
	
	
	Self-Rewarding Language ModelsPaper • 2401.10020 • Published • 151
- 
	
	
	Orion-14B: Open-source Multilingual Large Language ModelsPaper • 2401.12246 • Published • 14
- 
	
	
	MambaByte: Token-free Selective State Space ModelPaper • 2401.13660 • Published • 60
- 
	
	
	MM-LLMs: Recent Advances in MultiModal Large Language ModelsPaper • 2401.13601 • Published • 48
- 
	
	
	QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language ModelsPaper • 2309.14717 • Published • 45
- 
	
	
	PaLI-3 Vision Language Models: Smaller, Faster, StrongerPaper • 2310.09199 • Published • 29
- 
	
	
	Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA ExamsPaper • 2310.08678 • Published • 14
- 
	
	
	MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningPaper • 2310.09478 • Published • 21
 
							
							 
							
							 
							
							