# RAG-MCP

A minimal RAG (Retrieval-Augmented Generation) service with MCP tools for document ingestion and semantic search.

Documents are chunked via LangChain's `RecursiveCharacterTextSplitter`, embedded with a HuggingFace model, and stored in Qdrant or ChromaDB.

## Features

- **5 MCP Tools**: `ingest_documents`, `search`, `get_chunk`, `get_list`, `delete`
- **Multiple Vector Stores**: Qdrant (default) or ChromaDB
- **Configurable**: All settings via environment variables
- **Docker Ready**: One-command setup with docker-compose

## Quick Start

### 1. Start Qdrant

```bash
docker compose up -d qdrant-service
```

### 2. Configure MCP Client

Add to your MCP client config (LM Studio, Claude Desktop):

```json
{
  "mcpServers": {
    "rag-mcp": {
      "command": "uvx",
      "args": ["--from", "/path/to/Rag-MCP", "rag-mcp-qdrant"],
      "env": {
        "MODEL_NAME": "jinaai/jina-embeddings-v2-base-code",
        "MODEL_DEVICE": "cpu",
        "QDRANT_HOST": "localhost",
        "QDRANT_PORT": "6333"
      }
    }
  }
}
```

Replace `/path/to/Rag-MCP` with your actual path.

### 3. Use the Tools

- **Ingest**: `ingest_documents(collection="docs", documents=[{"text": "..."}])`
- **Search**: `search(collection="docs", query="find this", top_k=5)`
- **Get Chunk**: `get_chunk(collection="docs", chunk_id="...")`
- **List Docs**: `get_list(collection="docs")`
- **Delete**: `delete(collection="docs", doc_id="...")`

## Installation

### Method 1: uvx (Recommended)

```bash
uvx --from /path/to/Rag-MCP rag-mcp-qdrant
```

First run downloads ~800MB of dependencies and may take 2-3 minutes.

**Pre-install to avoid timeout:**
```bash
uvx --from /path/to/Rag-MCP rag-mcp-qdrant --help
```

### Method 2: pip

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install -e .
```

Then use this MCP config:
```json
{
  "mcpServers": {
    "rag-mcp": {
      "command": "/path/to/Rag-MCP/.venv/bin/rag-mcp-qdrant",
      "args": [],
      "env": {
        "QDRANT_HOST": "localhost"
      }
    }
  }
}
```

### Method 3: Docker

```bash
docker compose up -d qdrant-service
docker compose run --rm mcp
```

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `MODEL_NAME` | `jinaai/jina-embeddings-v2-base-code` | HuggingFace embedding model |
| `MODEL_DEVICE` | `cpu` | Device: `cpu`, `cuda`, `mps` |
| `VECTOR_STORE` | `qdrant` | Backend: `qdrant` or `chroma` |
| `QDRANT_HOST` | `localhost` | Qdrant server host |
| `QDRANT_PORT` | `6333` | Qdrant server port |
| `QDRANT_HTTPS` | `false` | Use HTTPS |
| `QDRANT_API_KEY` | `None` | API key for Qdrant Cloud |
| `CHROMA_PERSIST_DIRECTORY` | `./chroma_data` | ChromaDB storage path |
| `COLLECTION_PREFIX` | `` | Prefix for collection names |
| `CHUNK_SIZE` | `500` | Text chunk size |
| `CHUNK_OVERLAP` | `50` | Overlap between chunks |

## MCP Client Configuration

### LM Studio

File: Settings → MCP Servers

```json
{
  "mcpServers": {
    "rag-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/Rag-MCP", "rag-mcp-qdrant"],
      "env": {
        "MODEL_DEVICE": "cpu",
        "QDRANT_HOST": "localhost",
        "QDRANT_PORT": "6333"
      }
    }
  }
}
```

### Claude Desktop (macOS)

File: `~/Library/Application Support/Claude/claude_desktop_config.json`

```json
{
  "mcpServers": {
    "rag-mcp": {
      "command": "uvx",
      "args": ["--from", "/path/to/Rag-MCP", "rag-mcp-qdrant"],
      "env": {
        "MODEL_DEVICE": "cpu",
        "QDRANT_HOST": "localhost"
      }
    }
  }
}
```

### Claude Desktop (Windows)

File: `%APPDATA%\Claude\claude_desktop_config.json`

Same configuration as macOS.

## Usage Examples

### Python API

```python
from rag_core.config import get_config
from rag_core.model import Model
from rag_core.vector_store import QdrantVectorStore
from rag_core.search import RagService

cfg = get_config()
model = Model(model_name=cfg.model_name)
store = QdrantVectorStore.from_config(cfg)
rag = RagService(model=model, vector_store=store, config=cfg)

docs = [{"text": "Hello world", "metadata": {"source": "demo"}}]
rag.ingest(collection="demo", documents=docs)

hits = rag.search(collection="demo", query="hello", top_k=3)
print(hits)
```

### MCP Tool Calls

```python
ingest_documents(
    collection="knowledge",
    documents=[{"text": "AI is transforming industries."}],
    chunking={"chunk_size": 500, "overlap": 50}
)

search(
    collection="knowledge",
    query="AI transformation",
    top_k=5,
    score_threshold=0.5
)
```

## Troubleshooting

### Connection Refused
```bash
docker compose up -d qdrant-service
curl http://localhost:6333/health
```

### Model Download Slow
First run downloads the embedding model from HuggingFace (~400MB).

### GPU Support
```bash
pip install torch --index-url https://download.pytorch.org/whl/cu121
# Set MODEL_DEVICE=cuda in config
```

## License

MIT