-
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper • 2311.07989 • Published • 26 -
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Paper • 2310.06770 • Published • 9 -
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 11 -
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper • 2402.14261 • Published • 11
walterShen
walterShen
AI & ML interests
None yet
Organizations
None yet
Code LMs Benchmark
-
Running1.45k1.45k
Big Code Models Leaderboard
📈Submit code models for evaluation and view leaderboard
-
Running449449
Can Ai Code Results
🏆Can AI Code? An LLM leaderboard inclquantized models.
-
openai/openai_humaneval
Viewer • Updated • 164 • 82.1k • 344 -
google-research-datasets/mbpp
Viewer • Updated • 1.4k • 32.4k • 183
Code LMs Evaluation
-
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper • 2311.07989 • Published • 26 -
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Paper • 2310.06770 • Published • 9 -
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 11 -
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper • 2402.14261 • Published • 11
Code LMs Benchmark
-
Running1.45k1.45k
Big Code Models Leaderboard
📈Submit code models for evaluation and view leaderboard
-
Running449449
Can Ai Code Results
🏆Can AI Code? An LLM leaderboard inclquantized models.
-
openai/openai_humaneval
Viewer • Updated • 164 • 82.1k • 344 -
google-research-datasets/mbpp
Viewer • Updated • 1.4k • 32.4k • 183
models
0
None public yet
datasets
0
None public yet