When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published Oct 6 • 112
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization Paper • 2509.23202 • Published Sep 27 • 27
Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published Sep 1 • 24
Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning Paper • 2412.11689 • Published Dec 16, 2024 • 2
view article Article Efficient Deep Learning: A Comprehensive Overview of Optimization Techniques 👐 📚 Aug 26, 2024 • 81
NoLoCo: No-all-reduce Low Communication Training Method for Large Models Paper • 2506.10911 • Published Jun 12 • 7
Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning Paper • 2506.09501 • Published Jun 11 • 18
Reparameterized LLM Training via Orthogonal Equivalence Transformation Paper • 2506.08001 • Published Jun 9 • 6
Mathesis: Towards Formal Theorem Proving from Natural Languages Paper • 2506.07047 • Published Jun 8 • 5
SVD-Free Low-Rank Adaptive Gradient Optimization for Large Language Models Paper • 2505.17967 • Published May 23 • 17
Alchemist: Turning Public Text-to-Image Data into Generative Gold Paper • 2505.19297 • Published May 25 • 84
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20 • 78