-
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper • 2311.07989 • Published • 26 -
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Paper • 2310.06770 • Published • 9 -
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 11 -
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper • 2402.14261 • Published • 11
walterShen
walterShen
AI & ML interests
None yet
Organizations
None yet
Prompt Engineering
-
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Paper • 2311.04155 • Published • 1 -
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Paper • 2310.03714 • Published • 37 -
OpenPrompt: An Open-source Framework for Prompt-learning
Paper • 2111.01998 • Published • 1
Synthetic Data
-
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31 -
Better Synthetic Data by Retrieving and Transforming Existing Datasets
Paper • 2404.14361 • Published • 2 -
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
Paper • 2409.08239 • Published • 21
Agent
-
Understanding the planning of LLM agents: A survey
Paper • 2402.02716 • Published • 1 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 72 -
LLM Multi-Agent Systems: Challenges and Open Problems
Paper • 2402.03578 • Published • 1 -
CACA Agent: Capability Collaboration based AI Agent
Paper • 2403.15137 • Published
Code LMs Benchmark
-
Running1.46k1.46k
Big Code Models Leaderboard
📈Submit code models for evaluation and view leaderboard
-
Running449449
Can Ai Code Results
🏆Can AI Code? An LLM leaderboard inclquantized models.
-
openai/openai_humaneval
Viewer • Updated • 164 • 82.4k • 345 -
google-research-datasets/mbpp
Viewer • Updated • 1.4k • 34.9k • 185
World Model
world model
Code LMs
HAI4Code
Code LMs Evaluation
-
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper • 2311.07989 • Published • 26 -
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Paper • 2310.06770 • Published • 9 -
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 11 -
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper • 2402.14261 • Published • 11
Code LMs Benchmark
-
Running1.46k1.46k
Big Code Models Leaderboard
📈Submit code models for evaluation and view leaderboard
-
Running449449
Can Ai Code Results
🏆Can AI Code? An LLM leaderboard inclquantized models.
-
openai/openai_humaneval
Viewer • Updated • 164 • 82.4k • 345 -
google-research-datasets/mbpp
Viewer • Updated • 1.4k • 34.9k • 185
Prompt Engineering
-
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Paper • 2311.04155 • Published • 1 -
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Paper • 2310.03714 • Published • 37 -
OpenPrompt: An Open-source Framework for Prompt-learning
Paper • 2111.01998 • Published • 1
World Model
world model
Synthetic Data
-
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31 -
Better Synthetic Data by Retrieving and Transforming Existing Datasets
Paper • 2404.14361 • Published • 2 -
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
Paper • 2409.08239 • Published • 21
Code LMs
Agent
-
Understanding the planning of LLM agents: A survey
Paper • 2402.02716 • Published • 1 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 72 -
LLM Multi-Agent Systems: Challenges and Open Problems
Paper • 2402.03578 • Published • 1 -
CACA Agent: Capability Collaboration based AI Agent
Paper • 2403.15137 • Published
HAI4Code