Austin Xu's picture

2 5 1

Austin Xu

austinxu87

·

AI & ML interests

None yet

Recent Activity

upvoted a collection about 1 month ago

updated a collection about 1 month ago

upvoted a paper about 1 month ago

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains

View all activity

Organizations

upvoted a collection about 1 month ago

FARE

FARE are Salesforce AI Research's open multi-task evaluator models. • 4 items • Updated 28 days ago • 2

updated a collection about 1 month ago

FARE

FARE are Salesforce AI Research's open multi-task evaluator models. • 4 items • Updated 28 days ago • 2

upvoted a paper about 1 month ago

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains

Paper • 2510.17793 • Published Oct 20 • 2

commented a paper about 1 month ago

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains

Paper • 2510.17793 • Published Oct 20 • 2 •

updated 2 models about 1 month ago

Salesforce/FARE-8B

8B • Updated Oct 21 • 116 • 3

Salesforce/FARE-20B

4.76M • Updated Oct 21 • 13 • 3

published 2 models about 1 month ago

Salesforce/FARE-20B

4.76M • Updated Oct 21 • 13 • 3

Salesforce/FARE-8B

8B • Updated Oct 21 • 116 • 3

updated a collection about 1 month ago

FARE

FARE are Salesforce AI Research's open multi-task evaluator models. • 4 items • Updated 28 days ago • 2

authored a paper about 1 month ago

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

Paper • 2510.13744 • Published Oct 15 • 5

upvoted a paper about 1 month ago

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

Paper • 2510.14240 • Published Oct 16 • 11

New activity in Salesforce/Hard2Verify about 1 month ago

Update task category to `question-answering`, refine `sample_usage`, add tags, and fix typo

#2 opened about 1 month ago by

liked a dataset about 1 month ago

Salesforce/Hard2Verify

Viewer • Updated Oct 17 • 200 • 320 • 7

upvoted a paper about 1 month ago

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

Paper • 2510.13744 • Published Oct 15 • 5

published a dataset about 1 month ago

Salesforce/Hard2Verify

Viewer • Updated Oct 17 • 200 • 320 • 7

updated a dataset about 1 month ago

Salesforce/Hard2Verify

Viewer • Updated Oct 17 • 200 • 320 • 7

updated a collection about 1 month ago

FARE

FARE are Salesforce AI Research's open multi-task evaluator models. • 4 items • Updated 28 days ago • 2

authored 2 papers 2 months ago

SFR-RAG: Towards Contextually Faithful LLMs

Paper • 2409.09916 • Published Sep 16, 2024 • 1

Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings

Paper • 2503.15620 • Published Mar 19