-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 34 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
Shuo Xing
shuoxing
AI & ML interests
MLLMs, LLMs
Recent Activity
updated
a model
1 day ago
shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs32
published
a model
1 day ago
shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs32
updated
a model
1 day ago
shuoxing/qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs32