Igor Molybog
igormolybog
		AI & ML interests
Optimization, Machine Learning
		
		Organizations
None yet
llama + WebWork
			
			
	
	Solver training
			
			
	
	- 
	
	
	Language Models can be Logical SolversPaper • 2311.06158 • Published • 23
- 
	
	
	SymbolicAI: A framework for logic-based approaches combining generative models and solversPaper • 2402.00854 • Published • 22
- 
	
	
	Grandmaster-Level Chess Without SearchPaper • 2402.04494 • Published • 69
- 
	
	
	Beyond A*: Better Planning with Transformers via Search Dynamics BootstrappingPaper • 2402.14083 • Published • 48
Reasoning
			
			
	
	- 
	
	
	Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small ScorerPaper • 2311.06720 • Published • 9
- 
	
	
	System 2 Attention (is something you might need too)Paper • 2311.11829 • Published • 44
- 
	
	
	TinyGSM: achieving >80% on GSM8k with small language modelsPaper • 2312.09241 • Published • 40
- 
	
	
	ReFT: Reasoning with Reinforced Fine-TuningPaper • 2401.08967 • Published • 31
Long context
			
			
	
	- 
	
	
	Mamba: Linear-Time Sequence Modeling with Selective State SpacesPaper • 2312.00752 • Published • 146
- 
	
	
	SparQ Attention: Bandwidth-Efficient LLM InferencePaper • 2312.04985 • Published • 40
- 
	
	
	Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language ModelsPaper • 2401.04658 • Published • 27
- 
	
	
	E^2-LLM: Efficient and Extreme Length Extension of Large Language ModelsPaper • 2401.06951 • Published • 26
Agents
			
			
	
	Scaling laws
			
			
	
	- 
	
	
	Scaling Laws for Downstream Task Performance of Large Language ModelsPaper • 2402.04177 • Published • 20
- 
	
	
	A Tale of Tails: Model Collapse as a Change of Scaling LawsPaper • 2402.07043 • Published • 16
- 
	
	
	Scaling Laws for Fine-Grained Mixture of ExpertsPaper • 2402.07871 • Published • 14
- 
	
	
	When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning MethodPaper • 2402.17193 • Published • 26
robotics
			
			
	
	Imagen
			
			
	
	Inference speed
			
			
	
	- 
	
	
	FlashDecoding++: Faster Large Language Model Inference on GPUsPaper • 2311.01282 • Published • 37
- 
	
	
	Co-training and Co-distillation for Quality Improvement and Compression of Language ModelsPaper • 2311.02849 • Published • 8
- 
	
	
	Prompt Cache: Modular Attention Reuse for Low-Latency InferencePaper • 2311.04934 • Published • 34
- 
	
	
	Exponentially Faster Language ModellingPaper • 2311.10770 • Published • 119
evals
			
			
	
	- 
	
	
	Holistic Evaluation of Text-To-Image ModelsPaper • 2311.04287 • Published • 16
- 
	
	
	MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and TasksPaper • 2311.07463 • Published • 15
- 
	
	
	Trusted Source Alignment in Large Language ModelsPaper • 2311.06697 • Published • 12
- 
	
	
	DiLoCo: Distributed Low-Communication Training of Language ModelsPaper • 2311.08105 • Published • 16
Datasets
			
			
	
	- 
	
	
	Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation ModelsPaper • 2311.06783 • Published • 28
- 
	
	
	To See is to Believe: Prompting GPT-4V for Better Visual Instruction TuningPaper • 2311.07574 • Published • 16
- 
	
	
	Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept UnderstandingPaper • 2401.04575 • Published • 17
- 
	
	
	Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining ResearchPaper • 2402.00159 • Published • 65
Hetero training
			
			
	
	Open
			
			
	
	- 
	
	
	OLMo: Accelerating the Science of Language ModelsPaper • 2402.00838 • Published • 84
- 
	
	
	OpenMoE: An Early Effort on Open Mixture-of-Experts Language ModelsPaper • 2402.01739 • Published • 28
- 
	
	
	LLM Agent Operating SystemPaper • 2403.16971 • Published • 71
- 
	
	
	Poro 34B and the Blessing of MultilingualityPaper • 2404.01856 • Published • 15
LM economy
			
			
	
	- 
	
	
	Specialized Language Models with Cheap Inference from Limited Domain DataPaper • 2402.01093 • Published • 47
- 
	
	
	Computing Power and the Governance of Artificial IntelligencePaper • 2402.08797 • Published • 15
- 
	
	
	Understanding LLMs: A Comprehensive Overview from Training to InferencePaper • 2401.02038 • Published • 65
compression
			
			
	
	- 
	
	
	BitDelta: Your Fine-Tune May Only Be Worth One BitPaper • 2402.10193 • Published • 22
- 
	
	
	OneBit: Towards Extremely Low-bit Large Language ModelsPaper • 2402.11295 • Published • 24
- 
	
	
	BiLLM: Pushing the Limit of Post-Training Quantization for LLMsPaper • 2402.04291 • Published • 50
- 
	
	
	GPTVQ: The Blessing of Dimensionality for LLM QuantizationPaper • 2402.15319 • Published • 22
Alignment
			
			
	
	Domain spec fine-tuning 
			
			
	
	Inference speed
			
			
	
	- 
	
	
	FlashDecoding++: Faster Large Language Model Inference on GPUsPaper • 2311.01282 • Published • 37
- 
	
	
	Co-training and Co-distillation for Quality Improvement and Compression of Language ModelsPaper • 2311.02849 • Published • 8
- 
	
	
	Prompt Cache: Modular Attention Reuse for Low-Latency InferencePaper • 2311.04934 • Published • 34
- 
	
	
	Exponentially Faster Language ModellingPaper • 2311.10770 • Published • 119
llama + WebWork
			
			
	
	evals
			
			
	
	- 
	
	
	Holistic Evaluation of Text-To-Image ModelsPaper • 2311.04287 • Published • 16
- 
	
	
	MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and TasksPaper • 2311.07463 • Published • 15
- 
	
	
	Trusted Source Alignment in Large Language ModelsPaper • 2311.06697 • Published • 12
- 
	
	
	DiLoCo: Distributed Low-Communication Training of Language ModelsPaper • 2311.08105 • Published • 16
Solver training
			
			
	
	- 
	
	
	Language Models can be Logical SolversPaper • 2311.06158 • Published • 23
- 
	
	
	SymbolicAI: A framework for logic-based approaches combining generative models and solversPaper • 2402.00854 • Published • 22
- 
	
	
	Grandmaster-Level Chess Without SearchPaper • 2402.04494 • Published • 69
- 
	
	
	Beyond A*: Better Planning with Transformers via Search Dynamics BootstrappingPaper • 2402.14083 • Published • 48
Datasets
			
			
	
	- 
	
	
	Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation ModelsPaper • 2311.06783 • Published • 28
- 
	
	
	To See is to Believe: Prompting GPT-4V for Better Visual Instruction TuningPaper • 2311.07574 • Published • 16
- 
	
	
	Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept UnderstandingPaper • 2401.04575 • Published • 17
- 
	
	
	Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining ResearchPaper • 2402.00159 • Published • 65
Reasoning
			
			
	
	- 
	
	
	Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small ScorerPaper • 2311.06720 • Published • 9
- 
	
	
	System 2 Attention (is something you might need too)Paper • 2311.11829 • Published • 44
- 
	
	
	TinyGSM: achieving >80% on GSM8k with small language modelsPaper • 2312.09241 • Published • 40
- 
	
	
	ReFT: Reasoning with Reinforced Fine-TuningPaper • 2401.08967 • Published • 31
Hetero training
			
			
	
	Long context
			
			
	
	- 
	
	
	Mamba: Linear-Time Sequence Modeling with Selective State SpacesPaper • 2312.00752 • Published • 146
- 
	
	
	SparQ Attention: Bandwidth-Efficient LLM InferencePaper • 2312.04985 • Published • 40
- 
	
	
	Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language ModelsPaper • 2401.04658 • Published • 27
- 
	
	
	E^2-LLM: Efficient and Extreme Length Extension of Large Language ModelsPaper • 2401.06951 • Published • 26
Open
			
			
	
	- 
	
	
	OLMo: Accelerating the Science of Language ModelsPaper • 2402.00838 • Published • 84
- 
	
	
	OpenMoE: An Early Effort on Open Mixture-of-Experts Language ModelsPaper • 2402.01739 • Published • 28
- 
	
	
	LLM Agent Operating SystemPaper • 2403.16971 • Published • 71
- 
	
	
	Poro 34B and the Blessing of MultilingualityPaper • 2404.01856 • Published • 15
Agents
			
			
	
	LM economy
			
			
	
	- 
	
	
	Specialized Language Models with Cheap Inference from Limited Domain DataPaper • 2402.01093 • Published • 47
- 
	
	
	Computing Power and the Governance of Artificial IntelligencePaper • 2402.08797 • Published • 15
- 
	
	
	Understanding LLMs: A Comprehensive Overview from Training to InferencePaper • 2401.02038 • Published • 65
Scaling laws
			
			
	
	- 
	
	
	Scaling Laws for Downstream Task Performance of Large Language ModelsPaper • 2402.04177 • Published • 20
- 
	
	
	A Tale of Tails: Model Collapse as a Change of Scaling LawsPaper • 2402.07043 • Published • 16
- 
	
	
	Scaling Laws for Fine-Grained Mixture of ExpertsPaper • 2402.07871 • Published • 14
- 
	
	
	When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning MethodPaper • 2402.17193 • Published • 26
compression
			
			
	
	- 
	
	
	BitDelta: Your Fine-Tune May Only Be Worth One BitPaper • 2402.10193 • Published • 22
- 
	
	
	OneBit: Towards Extremely Low-bit Large Language ModelsPaper • 2402.11295 • Published • 24
- 
	
	
	BiLLM: Pushing the Limit of Post-Training Quantization for LLMsPaper • 2402.04291 • Published • 50
- 
	
	
	GPTVQ: The Blessing of Dimensionality for LLM QuantizationPaper • 2402.15319 • Published • 22
robotics
			
			
	
	Alignment
			
			
	
	Imagen
			
			
	
	