 admarcosai
			's Collections
			admarcosai
			's Collections
			
			
		Alignment: FineTuning-Preference
		
	updated
			
 
				
				
 - S-LoRA: Serving Thousands of Concurrent LoRA Adapters- 
			Paper
			 •- 
			2311.03285
			 •
			Published
				
			•- 
				32
			 
 - Tailoring Self-Rationalizers with Multi-Reward Distillation- 
			Paper
			 •- 
			2311.02805
			 •
			Published
				
			•- 
				7
			 
 - Ultra-Long Sequence Distributed Transformer- 
			Paper
			 •- 
			2311.02382
			 •
			Published
				
			•- 
				6
			 
 - OpenChat: Advancing Open-source Language Models with Mixed-Quality Data- 
			Paper
			 •- 
			2309.11235
			 •
			Published
				
			•- 
				15
			 
 - SiRA: Sparse Mixture of Low Rank Adaptation- 
			Paper
			 •- 
			2311.09179
			 •
			Published
				
			•- 
				9
			 
 - ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs- 
			Paper
			 •- 
			2311.13600
			 •
			Published
				
			•- 
				46
			 
 - Using Human Feedback to Fine-tune Diffusion Models without Any Reward
  Model- 
			Paper
			 •- 
			2311.13231
			 •
			Published
				
			•- 
				29
			 
 - Language Models are Super Mario: Absorbing Abilities from Homologous
  Models as a Free Lunch- 
			Paper
			 •- 
			2311.03099
			 •
			Published
				
			•- 
				30
			 
 - Rethinking Compression: Reduced Order Modelling of Latent Features in
  Large Language Models- 
			Paper
			 •- 
			2312.07046
			 •
			Published
				
			•- 
				17
			 
 - "I Want It That Way": Enabling Interactive Decision Support Using Large
  Language Models and Constraint Programming- 
			Paper
			 •- 
			2312.06908
			 •
			Published
				
			•- 
				10
			 
 - Federated Full-Parameter Tuning of Billion-Sized Language Models with
  Communication Cost under 18 Kilobytes- 
			Paper
			 •- 
			2312.06353
			 •
			Published
				
			•- 
				7
			 
 - TOFU: A Task of Fictitious Unlearning for LLMs- 
			Paper
			 •- 
			2401.06121
			 •
			Published
				
			•- 
				19
			 
 - Patchscope: A Unifying Framework for Inspecting Hidden Representations
  of Language Models- 
			Paper
			 •- 
			2401.06102
			 •
			Published
				
			•- 
				22
			 
 - Tuning LLMs with Contrastive Alignment Instructions for Machine
  Translation in Unseen, Low-resource Languages- 
			Paper
			 •- 
			2401.05811
			 •
			Published
				
			•- 
				8
			 
 - LLM Augmented LLMs: Expanding Capabilities through Composition- 
			Paper
			 •- 
			2401.02412
			 •
			Published
				
			•- 
				38
			 
 - TrustLLM: Trustworthiness in Large Language Models- 
			Paper
			 •- 
			2401.05561
			 •
			Published
				
			•- 
				69
			 
 - Contrastive Prefence Learning: Learning from Human Feedback without RL- 
			Paper
			 •- 
			2310.13639
			 •
			Published
				
			•- 
				25
			 
 - 
				- selfrag/selfrag_train_data- 
			Viewer
			 • 
	
				Updated
					
				• 
			
			146k
	
				•- 
					88
				
				 •- 
					73
				 
 
 - Efficient Exploration for LLMs- 
			Paper
			 •- 
			2402.00396
			 •
			Published
				
			•- 
				22
			 
 - Structured Code Representations Enable Data-Efficient Adaptation of Code
  Language Models- 
			Paper
			 •- 
			2401.10716
			 •
			Published
				
			•- 
				1
			 
 - Secrets of RLHF in Large Language Models Part II: Reward Modeling- 
			Paper
			 •- 
			2401.06080
			 •
			Published
				
			•- 
				28
			 
 - Secrets of RLHF in Large Language Models Part I: PPO- 
			Paper
			 •- 
			2307.04964
			 •
			Published
				
			•- 
				29
			 
 - Transforming and Combining Rewards for Aligning Large Language Models- 
			Paper
			 •- 
			2402.00742
			 •
			Published
				
			•- 
				12
			 
 - ReFT: Reasoning with Reinforced Fine-Tuning- 
			Paper
			 •- 
			2401.08967
			 •
			Published
				
			•- 
				31
			 
 - SciGLM: Training Scientific Language Models with Self-Reflective
  Instruction Annotation and Tuning- 
			Paper
			 •- 
			2401.07950
			 •
			Published
				
			•- 
				4
			 
 - Generative Representational Instruction Tuning- 
			Paper
			 •- 
			2402.09906
			 •
			Published
				
			•- 
				54
			 
 - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation- 
			Paper
			 •- 
			2402.10210
			 •
			Published
				
			•- 
				35
			 
 - RLVF: Learning from Verbal Feedback without Overgeneralization- 
			Paper
			 •- 
			2402.10893
			 •
			Published
				
			•- 
				12