BahaaGalal
			's Collections
			 
		
			
		LLM for Coding
		
	updated
			
 
				
				
	
	
	
			
			Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large
  Language Models in Code Generation from Scientific Plots
		
			Paper
			
•
			2405.07990
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Large Language Models as Planning Domain Generators
		
			Paper
			
•
			2405.06650
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			AutoCrawler: A Progressive Understanding Web Agent for Web Crawler
  Generation
		
			Paper
			
•
			2404.12753
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
			
			OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real
  Computer Environments
		
			Paper
			
•
			2404.07972
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			LLoCO: Learning Long Contexts Offline
		
			Paper
			
•
			2404.07979
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			CodecLM: Aligning Language Models with Tailored Synthetic Data
		
			Paper
			
•
			2404.05875
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Elephants Never Forget: Memorization and Learning of Tabular Data in
  Large Language Models
		
			Paper
			
•
			2404.06209
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
		
			Paper
			
•
			2404.05719
			
•
			Published
				
			•
				
				83
			
 
	
	 
	
	
	
			
			CantTalkAboutThis: Aligning Language Models to Stay on Topic in
  Dialogues
		
			Paper
			
•
			2404.03820
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			CodeEditorBench: Evaluating Code Editing Capability of Large Language
  Models
		
			Paper
			
•
			2404.03543
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Language Models as Compilers: Simulating Pseudocode Execution Improves
  Algorithmic Reasoning in Language Models
		
			Paper
			
•
			2404.02575
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			RAFT: Adapting Language Model to Domain Specific RAG
		
			Paper
			
•
			2403.10131
			
•
			Published
				
			•
				
				72
			
 
	
	 
	
	
	
			
			Quiet-STaR: Language Models Can Teach Themselves to Think Before
  Speaking
		
			Paper
			
•
			2403.09629
			
•
			Published
				
			•
				
				78
			
 
	
	 
	
	
	
			
			Design2Code: How Far Are We From Automating Front-End Engineering?
		
			Paper
			
•
			2403.03163
			
•
			Published
				
			•
				
				97
			
 
	
	 
	
	
	
			
			StarCoder 2 and The Stack v2: The Next Generation
		
			Paper
			
•
			2402.19173
			
•
			Published
				
			•
				
				149
			
 
	
	 
	
	
	
			
			StructLM: Towards Building Generalist Models for Structured Knowledge
  Grounding
		
			Paper
			
•
			2402.16671
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			API-BLEND: A Comprehensive Corpora for Training and Benchmarking API
  LLMs
		
			Paper
			
•
			2402.15491
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			OpenCodeInterpreter: Integrating Code Generation with Execution and
  Refinement
		
			Paper
			
•
			2402.14658
			
•
			Published
				
			•
				
				83
			
 
	
	 
	
	
	
			
			Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
		
			Paper
			
•
			2402.14261
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue
  Summarization
		
			Paper
			
•
			2402.13249
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Chain-of-Thought Reasoning Without Prompting
		
			Paper
			
•
			2402.10200
			
•
			Published
				
			•
				
				109
			
 
	
	 
	
	
	
			
			A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
		
			Paper
			
•
			2402.09727
			
•
			Published
				
			•
				
				38
			
 
	
	 
	
	
	
			
			MPIrigen: MPI Code Generation through Domain-Specific Language Models
		
			Paper
			
•
			2402.09126
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Multi-line AI-assisted Code Authoring
		
			Paper
			
•
			2402.04141
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			StepCoder: Improve Code Generation with Reinforcement Learning from
  Compiler Feedback
		
			Paper
			
•
			2402.01391
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
			
			ReGAL: Refactoring Programs to Discover Generalizable Abstractions
		
			Paper
			
•
			2401.16467
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
		
			Paper
			
•
			2401.03065
			
•
			Published
				
			•
				
				11