What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models Paper • 2601.06165 • Published 15 days ago • 15
COMPASS Collection A Framework for Evaluating Organization-Specific Policy Alignment in LLMs • 5 items • Updated 16 days ago • 5
Everyday Physics in Korean Contexts: A Culturally Grounded Physical Reasoning Benchmark Paper • 2509.17807 • Published Sep 22, 2025 • 1
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 18
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 17 days ago • 9
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 17 days ago • 9
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 17 days ago • 9
AIM-Intelligence/COMPASS_Qwen2.5-7B-Instruct_LoRA Text Generation • 8B • Updated 16 days ago • 20 • 2
AIM-Intelligence/COMPASS-Policy-Alignment-Testbed-Dataset Viewer • Updated 16 days ago • 5.92k • 176 • 10
AIM-Intelligence/COMPASS-Policy-Alignment-Testbed-Dataset Viewer • Updated 16 days ago • 5.92k • 176 • 10
COMPASS Collection A Framework for Evaluating Organization-Specific Policy Alignment in LLMs • 5 items • Updated 16 days ago • 5
AIM-Intelligence/COMPASS_Qwen2.5-7B-Instruct_LoRA Text Generation • 8B • Updated 16 days ago • 20 • 2
COMPASS Collection A Framework for Evaluating Organization-Specific Policy Alignment in LLMs • 5 items • Updated 16 days ago • 5