Granite Embedding R2: Setting New Standards for Enterprise Retrieval

Community Article Published October 14, 2025

tl;dr IBM Research introduces next-generation embedding models that deliver breakthrough speed with top-tier accuracy

GraniteBlockPost

In this post:


When it comes to enterprise information retrieval, organizations face a persistent challenge: existing embedding models force you to choose between accuracy and speed, between long-context support and commercial licensing, between general-purpose performance and domain-specific excellence.

On August 15th 2025, we’ve introduced the Granite Embedding R2 models — a comprehensive family of retrieval models designed to reduce the impact of these tradeoffs.

What's New in the R2 Models?

The Granite Embedding R2 release includes three models, all available under Apache 2.0 license:

These models deliver three improvements over our first-generation release:

  • 16x expanded context length from 512 to 8,192 tokens — meeting modern document processing requirements
  • 19–44% faster inference than comparable models, without sacrificing accuracy
  • state-of-the-art performance across text, code, long-documents, conversational queries, and tabular data

(Want to skip straight to the code? We get it — jump to the examples and start embedding things.)

Built on Modern Foundations

The R2 models leverage the ModernBERT architecture, incorporating recent advances in encoder design:

  • Alternating attention mechanisms for efficiency
  • Rotary positional embeddings enabling flexible context lengths
  • Flash Attention support for optimized inference

We trained these models on 2 trillion tokens from high-quality sources including GneissWeb, Wikipedia, and Granite Code data. Every dataset underwent comprehensive governance review, with screening for personal information and profanity — because enterprise deployments demand transparency and responsible AI practices.

A Novel Training Pipeline

What sets Granite R2 apart is our five-stage training methodology:

  1. Retrieval-Oriented Pretraining: Using RetroMAE to train rich [CLS] representations without explicit contrastive objectives.
  2. Tabular Pretraining: Traditional embedding models struggle with tables containing numerical data and limited context. Our solution? We generated synthetic summaries for 8 million tables using Mistral-7B, then modified the RetroMAE objective to predict masked tokens over summaries rather than table content itself. This forces the encoder to align table structure with natural language descriptions.
  3. Contrastive Finetuning: Training on large-scale semi-supervised pairs with improved contrastive loss.
  4. Contrastive Distillation: Rather than simply finetuning on hard negatives, we distill knowledge from a Mistral-7B teacher model trained on high-quality triples. This approach yields larger performance gains than traditional hard-negative training.
  5. Domain Adaptation: Specialized training for multi-turn conversational retrieval.

This pipeline enables a single model family to excel across remarkably diverse tasks.

Fast and Accurate Models for Retrieval and Reranking

We evaluated Granite R2 on six open source retrieval benchmarks part of MTEB benchmarks (MTEB v2, CoIR, TableIR, LongEmbed, MTRAG, and MLDR), and the results demonstrate clear leadership in both accuracy and speed, as shown below.

Granite6AverageFinal Solid bars represent models under 500M parameters, and hashed bars are for models under 100M parameters. Corresponding families share the same fill color.

Accuracy: State-of-the-Art Across the Board

As the chart shows, the granite-embedding-english-r2 model achieves the highest average performance at 59.5 NDCG@10, outperforming all comparable open-source models — including models that are twice its size. Even our efficient granite-embedding-small-english-r2 scores an average of 55.6, surpassing many larger open-source competitors. As of this writing (October 6th, 2025) — if one builds an English benchmark on the MTEB site with Reranking and Retrieval as tasks (which are the two objectives of the R2 granite embedding models) — the granite-embedding-english-r2 model is ranked first (as seen below) among the models with less than 500M parameters¹. GraniteBaseRanking

Similarly, the granite-embedding-small-english-r2 is ranked second for models under 100M parameters:

GraniteSmallRanking

Speed: Industry-Leading Efficiency

Performance benchmarks often overlook a critical real-world constraint: encoding speed. When you’re ingesting millions of documents with frequent updates, speed directly impacts operational costs and user experience.

We benchmarked text embedding speed on a dataset of 23,000 IBM technical documents (averaging 6,393 characters, ranging from 10 to 475,001 characters, details in — you guessed it! — our paper): GraniteSpeed

The R2 models are 19–44% faster than leading competitors and as fast as the R1 models, despite the R2 models having slightly more parameters. The ModernBERT architecture’s optimizations — particularly Flash Attention — enable this efficiency gain.

The speed advantage becomes even more pronounced with the small model, which processes nearly 200 documents per second while maintaining competitive accuracy. This makes it ideal for real-time applications and high-throughput ingestion pipelines. All experiments were run on a H100, with a context size of 512 tokens and a batch of 128.

Complete Retrieval Ecosystem: Reranker

The reranker model completes the retrieval pipeline. Built on the granite-embedding-english-r2 model, it uses a PListMLE loss objective for position-aware ranking. Below is the comparison of the performance of the granite-embedding-english-reranker-r2 compared with a few open source rerankers of similar size:

Reranker Performance

This retrieve-and-rerank framework maximizes both recall and precision without severe computational overhead.

Enterprise-Ready from Day One

Granite models prioritize enterprise requirements, including

  • Data Governance: Comprehensive clearance process capturing content description, intended use, data classification, licensing information, usage restrictions, and personal information assessment
  • Licensing: Apache 2.0 — no restrictions on commercial use, no proprietary training data limitations
  • NTransparency: Fully documented training data sources, architectural decisions, and evaluation methodology

More about IBM’s open source LLM policy: https://www.ibm.com/granite/trust

How to Use the Models

All Granite Embedding R2 models are available now on Hugging Face under Apache 2.0 license:

For technical details, architecture description, and comprehensive benchmark results, see our research paper.

Head to granite embedding R2 models jupyter notebook to test/deploy these models, and visit the links above to read the models' cards. Please consider giving us a ❤️ if you find the model useful!

Why This Matters

Information retrieval isn’t just about finding documents — it’s about enabling AI systems to access relevant knowledge efficiently. Whether you’re building RAG applications, semantic search engines, or recommendation systems, embedding quality and speed determine what’s possible.

Granite R2 models don’t force you to choose between accuracy and speed, between long-context support and efficiency, between general-purpose capability and domain-specific performance — they deliver all of it.

In an era where milliseconds matter and accuracy cannot be compromised, Granite R2 models don’t just meet the standard — they set it!

The Granite Embedding R2 models represent collaborative work across IBM Research teams in multiple geographies. For questions or feedback, visit our GitHub repository. This work is a collaboration with many people at IBM Research including (in alphabetical order): Parul Awasthy, Ken Barker, Riyaz Bhat, Meet Doshi, Martin Franz, Bhavani Iyer, Vishwajeet Kumar, Yulong Li, Rudra Murthy, Vignesh P, Salim Roukos, Jaydeep Sen, Aashka Trivedi, Todd Ward, and Yushu (Elaine) Yang.

¹To generate the comparison tables above, go to the MTEB leaderboard, select “General Purpose”/English on the left, then open the tie “Customize this Benchmark” on the right and remove every task but “Retrieval” and “Reranking”. Finally, open the tie for “Advanced Model Filters” and select models “<500M” Model Parameters.

Community

Sign up or log in to comment