Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

walterShen's picture

17 2

walterShen

walterShen

drgitt's profile picture

·

_walterShen

AI & ML interests

None yet

Organizations

None yet

Collections 8

Code LMs Evaluation

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 26
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Paper • 2310.06770 • Published Oct 10, 2023 • 9
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5, 2024 • 11
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Paper • 2402.14261 • Published Feb 22, 2024 • 11

Code LMs Benchmark

Running

1.45k

1.45k

Big Code Models Leaderboard

📈

Submit code models for evaluation and view leaderboard
Running

449

449

Can Ai Code Results

🏆

Can AI Code? An LLM leaderboard inclquantized models.
openai/openai_humaneval

Viewer • Updated Jan 4, 2024 • 164 • 82.1k • 344
google-research-datasets/mbpp

Viewer • Updated Jan 4, 2024 • 1.4k • 32.4k • 183

Code LMs Evaluation

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 26
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Paper • 2310.06770 • Published Oct 10, 2023 • 9
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5, 2024 • 11
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Paper • 2402.14261 • Published Feb 22, 2024 • 11

Code LMs Benchmark

Running

1.45k

1.45k

Big Code Models Leaderboard

📈

Submit code models for evaluation and view leaderboard
Running

449

449

Can Ai Code Results

🏆

Can AI Code? An LLM leaderboard inclquantized models.
openai/openai_humaneval

Viewer • Updated Jan 4, 2024 • 164 • 82.1k • 344
google-research-datasets/mbpp

Viewer • Updated Jan 4, 2024 • 1.4k • 32.4k • 183

View 8 collections

models 0

None public yet

datasets 0

None public yet

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs