some benchmark - a little-jack Collection

Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

little-jack 's Collections

some benchmark

updated Oct 28, 2025

cais/mmlu

Viewer • Updated Mar 8, 2024 • 231k • 438k • 790
TIGER-Lab/MMLU-Pro

Benchmark • Updated May 2 • 12.1k • 146k • 499
cais/hle

Benchmark • Updated Jan 20 • 2.5k • 31.4k • 867
m-a-p/SuperGPQA

Viewer • Updated Apr 30, 2025 • 26.5k • 8.26k • 91
lmarena-ai/arena-hard-auto

Updated May 1, 2025 • 1.73k • 8
Running

Agents

203

MT Bench

📊

203

Explore and compare AI model answers on benchmark questions

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs