Spaces:

ai-forever
/

rag-leaderboard

Running

App Files Files Community

ai-forever commited on Mar 28

Commit

aff180f

verified ·

1 Parent(s): 7a8aa93

Initialize README

Browse files

Files changed (1) hide show

README.md +116 -12

README.md CHANGED Viewed

@@ -1,12 +1,116 @@
----
-title: Rag Leaderboard
-emoji: 👁
-colorFrom: blue
-colorTo: pink
-sdk: gradio
-sdk_version: 5.23.1
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: RAG Benchmark Leaderboard
+emoji: 📚
+colorFrom: gray
+colorTo: purple
+sdk: gradio
+sdk_version: 5.4.0
+app_file: app.py
+pinned: false
+---
+# RAG Benchmark Leaderboard
+An interactive leaderboard for comparing and visualizing the performance of RAG (Retrieval-Augmented Generation) systems.
+## Features
+- **Version Comparison**: Compare model performances across different versions of the benchmark dataset
+- **Interactive Radar Charts**: Visualize generative and retrieval metrics
+- **Customizable Views**: Filter and sort models based on different criteria
+- **Easy Submission**: Simple API for submitting your model results
+## Installation
+```bash
+pip install -r requirements.txt
+```
+## Running the Leaderboard
+```bash
+cd leaderboard
+python app.py
+```
+This will start a Gradio server, and you can access the leaderboard in your browser at http://localhost:7860.
+## Submitting Results
+To submit your results to the leaderboard, use the provided API:
+```python
+from rag_benchmark import RAGBenchmark
+# Initialize the benchmark
+benchmark = RAGBenchmark(version="2.0")  # Use the latest version
+# Run evaluation
+results = benchmark.evaluate(
+    model_name="Your Model Name",
+    embedding_model="your-embedding-model",
+    retriever_type="dense",  # Options: dense, sparse, hybrid
+    retrieval_config={"top_k": 3}
+)
+# Submit results
+benchmark.submit_results(results)
+```
+## Data Format
+The results.json file has the following structure:
+```json
+{
+  "items": {
+    "1.0": {  // Dataset version
+      "model1": {  // Submission ID
+        "model_name": "Model Name",
+        "timestamp": "2024-03-20T12:00:00",
+        "config": {
+          "embedding_model": "embedding-model-name",
+          "retriever_type": "dense",
+          "retrieval_config": {
+            "top_k": 3
+          }
+        },
+        "metrics": {
+          "retrieval": {
+            "hit_rate": 0.82,
+            "mrr": 0.65,
+            "precision": 0.78
+          },
+          "generation": {
+            "rouge1": 0.72,
+            "rouge2": 0.55,
+            "rougeL": 0.68
+          }
+        }
+      }
+    }
+  },
+  "last_version": "2.0",
+  "n_questions": "1000"
+}
+```
+## License
+MIT
+# RAG Evaluation Leaderboard
+    This leaderboard tracks different RAG (Retrieval-Augmented Generation) implementations and their performance metrics.
+    ## Metrics Tracked
+    ### Retrieval Metrics
+    - Hit Rate: Proportion of relevant documents retrieved
+    - MRR (Mean Reciprocal Rank): Position of first relevant document
+    ### Generation Metrics
+    - ROUGE-1: Unigram overlap
+    - ROUGE-2: Bigram overlap
+    - ROUGE-L: Longest common subsequence