RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models Paper • 2406.11020 • Published Jun 16, 2024
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models Paper • 2312.17661 • Published Dec 29, 2023 • 15