Blog, Articles, and discussions

Community Articles

Why Did MiniMax M2 End Up as a Full Attention Model?

Granite 4.0 Nano: Just how small can you go?

and 1 other •

On the Shifting Global Compute Landscape

and 1 other •

What makes good reasoning data

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

and 4 others •

about 24 hours ago

Aligning to What? Rethinking Agent Generalization in MiniMax M2

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

Running Large Transformer Models on Mobile and Edge Devices

Code a simple RAG from scratch

Llasa Goes RL: Training LLaSA with GRPO for Improved Prosody and Expressiveness

about 22 hours ago

NVIDIA Isaac GR00T in LeRobot

and 4 others •

How to Build a Healthcare Robot from Simulation to Deployment with NVIDIA Isaac for Healthcare

TorchSim: A new PyTorch-based molecular dynamics engine

Evaluate Your Own RAG: Why Best Practices Failed Us

and 4 others •

about 20 hours ago

Small Language Models (SLM): A Comprehensive Overview

LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR

and 2 others •

Let's talk about LLM evaluation

KV Caching Explained: Optimizing Transformer Inference Efficiency

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

and 5 others •

leaderboardevaluationnlp

Arabic Leaderboards: Introducing Arabic Instruction Following, Updating AraGen, and More

+2

math-verifyopen-llm-leaderboardleaderboard

Fixing Open LLM Leaderboard with Math-Verify

February 14, 2025

nlpresearchleaderboard

The Open Arabic LLM Leaderboard 2

+3

February 10, 2025

open-llm-leaderboardleaderboardenergy_efficiency

CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard

January 9, 2025

leaderboardresearchcollaboration

Evaluating Audio Reasoning with Big Bench Audio

December 20, 2024

leaderboardevaluationnlp

Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard

+1

December 4, 2024

communityresearchnlp

Letting Large Models Debate: The First Multilingual LLM Debate Competition

+8

November 20, 2024

communityresearchnlp

Introduction to the Open Leaderboard for Japanese LLMs

+2

November 20, 2024

leaderboardarenacollaboration

Judge Arena: Benchmarking LLMs as Evaluators

+4

November 19, 2024

leaderboardcollaborationcommunity

Introducing the Open FinLLM Leaderboard

+9

October 4, 2024

nlpresearchleaderboard

🇨🇿 BenCzechMark - Can your LLM Understand Czech?

+7

October 1, 2024

ai4mathnlpcommunity

How NuminaMath Won the 1st AIMO Progress Prize

+4

agentssmolagentsnlp

Our Transformers Code Agent beats the GAIA benchmark!

leaderboardresearchcollaboration

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

+5

Community Articles

Why Did MiniMax M2 End Up as a Full Attention Model?

Granite 4.0 Nano: Just how small can you go?

and 1 other •

On the Shifting Global Compute Landscape

and 1 other •

What makes good reasoning data

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

and 4 others •

about 24 hours ago

Aligning to What? Rethinking Agent Generalization in MiniMax M2

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

Running Large Transformer Models on Mobile and Edge Devices

Code a simple RAG from scratch

Llasa Goes RL: Training LLaSA with GRPO for Improved Prosody and Expressiveness

about 22 hours ago

NVIDIA Isaac GR00T in LeRobot

and 4 others •

How to Build a Healthcare Robot from Simulation to Deployment with NVIDIA Isaac for Healthcare

TorchSim: A new PyTorch-based molecular dynamics engine

Evaluate Your Own RAG: Why Best Practices Failed Us

and 4 others •

about 20 hours ago

Small Language Models (SLM): A Comprehensive Overview

LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR

and 2 others •

Let's talk about LLM evaluation

KV Caching Explained: Optimizing Transformer Inference Efficiency

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

and 5 others •

View all