Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
SaylorTwift
's Collections
benchmarks
RULER Datasets Falcon-H1-3B-Base
RULER Datasets Lamma3-Instruct
RULER Datasets Qwen2.5-Instruct
RULER Datasets Qwen-3-Instruct
RULER Datasets Qwen-3
agents
Agents ressources
benchmarks
updated
7 days ago
Upvote
-
meituan-longcat/LARYBench
Updated
4 days ago
•
9.89k
•
16
llamaindex/ParseBench
Benchmark
•
Updated
15 days ago
•
169k
•
20k
•
74
nvidia/QCalEval
Viewer
•
Updated
21 days ago
•
243
•
1.06k
•
16
allenai/olmOCR-bench
Benchmark
•
Updated
Feb 19
•
3.48k
•
203
LongHorizonReasoning/longcot
Viewer
•
Updated
14 days ago
•
5k
•
688
•
12
mercor/apex-agents
Benchmark
•
Updated
Mar 3
•
480
•
35.4k
•
121
hsiung/MagicBench
Viewer
•
Updated
16 days ago
•
50
•
182
•
10
openlifescienceai/medmcqa
Viewer
•
Updated
Jan 4, 2024
•
193k
•
29.4k
•
222
openai/healthbench-professional
Viewer
•
Updated
12 days ago
•
525
•
7.63k
•
48
ShadenA/MathNet
Viewer
•
Updated
7 days ago
•
55.6k
•
14.6k
•
43
claw-eval/Claw-Eval
Viewer
•
Updated
9 days ago
•
300
•
2.66k
•
17
Upvote
-
Share collection
View history
Collection guide
Browse collections