Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
updated README
Browse files
README.md
CHANGED
|
@@ -87,20 +87,119 @@ All communication between modules happens over HTTP:
|
|
| 87 |
- **Orchestrator β Generator**: HTTP streaming (SSE for real-time responses)
|
| 88 |
- **ChatUI β Orchestrator**: LangServe streaming endpoints
|
| 89 |
|
| 90 |
-
|
| 91 |
|
| 92 |
-
The orchestrator implements
|
| 93 |
|
| 94 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
```
|
| 96 |
-
|
|
|
|
|
|
|
| 97 |
```
|
| 98 |
|
| 99 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
```
|
| 101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
```
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
## Components
|
| 105 |
|
| 106 |
### 1. Main Application (`main.py`)
|
|
@@ -207,7 +306,7 @@ MAX_CONTEXT_CHARS = 15000
|
|
| 207 |
Create a `.env` file with:
|
| 208 |
|
| 209 |
```bash
|
| 210 |
-
# Required for private HuggingFace Spaces
|
| 211 |
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
|
| 212 |
|
| 213 |
```
|
|
@@ -267,7 +366,7 @@ LLM_SUMMARIZATION=false
|
|
| 267 |
|
| 268 |
Key things to ensure here:
|
| 269 |
- multimodalAcceptedMimetypes: file types to accept for upload via ChatUI
|
| 270 |
-
- endpoints: orchestrator url + endpoints
|
| 271 |
|
| 272 |
## Deployment Guide
|
| 273 |
|
|
@@ -487,6 +586,8 @@ Clears the direct output file cache.
|
|
| 487 |
|
| 488 |
Gradio's default API endpoint for UI interactions. If running on huggingface spaces, access via: https://[ORG_NAME]-[SPACE_NAME].hf.space/gradio/
|
| 489 |
|
|
|
|
|
|
|
| 490 |
|
| 491 |
## Troubleshooting
|
| 492 |
|
|
|
|
| 87 |
- **Orchestrator β Generator**: HTTP streaming (SSE for real-time responses)
|
| 88 |
- **ChatUI β Orchestrator**: LangServe streaming endpoints
|
| 89 |
|
| 90 |
+
## Workflow Logic and File Processing
|
| 91 |
|
| 92 |
+
The orchestrator implements a dual-mode workflow designed to handle non-standard ingesion operations (e.g., Whisp GeoJSON API calls - as these need to be returned directly without going through the generator) while maintaining conversational context across multiple turns. This also addresses an issue with ChatUI in that it resends uploaded files on each turn in the conversation (e.g. follow-up queries).
|
| 93 |
|
| 94 |
+
### Processing Modes Overview
|
| 95 |
+
|
| 96 |
+
#### Mode 1: Direct Output (DIRECT_OUTPUT = True)
|
| 97 |
+
|
| 98 |
+
**Purpose:** Immediately return long-running ingestor results to the user without LLM processing, then use those results as context for follow-up questions.
|
| 99 |
+
|
| 100 |
+
**First File Upload:**
|
| 101 |
+
```
|
| 102 |
+
File Upload β Detect Type β Direct Output Ingest β Return Raw Results
|
| 103 |
+
β
|
| 104 |
+
Cache Result (by file hash)
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
**Subsequent Conversation Turns:**
|
| 108 |
```
|
| 109 |
+
Follow-up Query β Detect Cached File β Retrieved Context β Combined Context β Generator
|
| 110 |
+
β
|
| 111 |
+
Use Cached Ingestor Output as Context
|
| 112 |
```
|
| 113 |
|
| 114 |
+
**Key Behaviors:**
|
| 115 |
+
- First upload returns raw ingestor output immediately (no LLM generation)
|
| 116 |
+
- File content is hashed (SHA256) for deduplication
|
| 117 |
+
- Ingestor results are cached with the file hash
|
| 118 |
+
- All follow-up queries in the conversation use the cached ingestor output as retrieval context
|
| 119 |
+
|
| 120 |
+
**Noteable Unintuitive Behavior:** Once the file is cached on the Orchestrator, re-uploading the same file (even with different filename in a different chat) skips re-processing.
|
| 121 |
+
|
| 122 |
+
**Example Conversation Flow:**
|
| 123 |
```
|
| 124 |
+
User: [Uploads plot_boundaries.geojson]
|
| 125 |
+
System: [Returns API analysis results directly - no LLM processing]
|
| 126 |
+
|
| 127 |
+
User: "What deforestation risks were identified?"
|
| 128 |
+
System: [Uses cached GeoJSON results + retrieval β LLM generation]
|
| 129 |
+
|
| 130 |
+
User: "How does this compare to EUDR requirements?"
|
| 131 |
+
System: [Uses same cached results + conversation history + retrieval β LLM generation]
|
| 132 |
```
|
| 133 |
|
| 134 |
+
#### Mode 2: Standard RAG (DIRECT_OUTPUT = False)
|
| 135 |
+
|
| 136 |
+
**Purpose:** Traditional RAG pipeline where uploaded files are treated as additional context for generation from first instance.
|
| 137 |
+
|
| 138 |
+
**Every Query (with or without file):**
|
| 139 |
+
```
|
| 140 |
+
Query + Optional File β Detect Type β Ingest β Retrieved Context β Combined Context β Generator
|
| 141 |
+
β
|
| 142 |
+
Add to Context
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
+
**Key Behaviors:**
|
| 146 |
+
- Files are processed through ingestor when uploaded
|
| 147 |
+
- Ingestor output is added to the retrieval context (not returned directly)
|
| 148 |
+
- Generator always processes the combined context (ingestor + retriever)
|
| 149 |
+
- No special caching or deduplication logic
|
| 150 |
+
|
| 151 |
+
**Example Conversation Flow:**
|
| 152 |
+
```
|
| 153 |
+
User: "What are EUDR requirements?" + policy_document.pdf
|
| 154 |
+
System: [PDF β Retrieval β Combined Context β Generator]
|
| 155 |
+
|
| 156 |
+
User: "Summarize section 3"
|
| 157 |
+
System: [Retrieval β Combined Context β Generator]
|
| 158 |
+
```
|
| 159 |
+
|
| 160 |
+
### File Hash Caching Mechanism
|
| 161 |
+
|
| 162 |
+
The orchestrator uses SHA256 hashing to detect duplicate file uploads:
|
| 163 |
+
|
| 164 |
+
**Cache Structure:**
|
| 165 |
+
```python
|
| 166 |
+
{
|
| 167 |
+
"a3f5c91...": {
|
| 168 |
+
"ingestor_context": "API results...",
|
| 169 |
+
"timestamp": "2025-10-02T14:30:00",
|
| 170 |
+
"filename": "boundaries.geojson",
|
| 171 |
+
"file_type": "geojson"
|
| 172 |
+
}
|
| 173 |
+
}
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
**Detection Logic:**
|
| 177 |
+
1. File is uploaded
|
| 178 |
+
2. Compute SHA256 hash of file content
|
| 179 |
+
3. Check if hash exists in cache
|
| 180 |
+
4. If not found: Process through ingestor, cache results
|
| 181 |
+
5. If found: Use cached results (Skip Ingestion β Retrieved Context β Combined Context β Generator)
|
| 182 |
+
|
| 183 |
+
|
| 184 |
+
### Conversation Context Management
|
| 185 |
+
|
| 186 |
+
The system maintains conversation history separately from file processing with a simple management approach:
|
| 187 |
+
|
| 188 |
+
**Context Building (`build_conversation_context()`):**
|
| 189 |
+
- Always includes first user/assistant exchange
|
| 190 |
+
- Includes last N complete turns (default: 3)
|
| 191 |
+
- Respects character limits (default: 12,000 chars)
|
| 192 |
+
|
| 193 |
+
**Retrieval Strategy:**
|
| 194 |
+
- Uses **only the latest user query** for semantic search
|
| 195 |
+
- Does NOT send entire conversation history to retriever
|
| 196 |
+
- Ensures relevant document retrieval based on current question
|
| 197 |
+
|
| 198 |
+
**Generation Context:**
|
| 199 |
+
- Combines: Conversation history + Retrieved context + Cached file results
|
| 200 |
+
- Generator uses full context to produce coherent, contextually-aware responses
|
| 201 |
+
|
| 202 |
+
|
| 203 |
## Components
|
| 204 |
|
| 205 |
### 1. Main Application (`main.py`)
|
|
|
|
| 306 |
Create a `.env` file with:
|
| 307 |
|
| 308 |
```bash
|
| 309 |
+
# Required for accessing private HuggingFace Spaces modules
|
| 310 |
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
|
| 311 |
|
| 312 |
```
|
|
|
|
| 366 |
|
| 367 |
Key things to ensure here:
|
| 368 |
- multimodalAcceptedMimetypes: file types to accept for upload via ChatUI
|
| 369 |
+
- endpoints: orchestrator url + endpoints (note these are the HF API urls, not the HF UI urls)
|
| 370 |
|
| 371 |
## Deployment Guide
|
| 372 |
|
|
|
|
| 586 |
|
| 587 |
Gradio's default API endpoint for UI interactions. If running on huggingface spaces, access via: https://[ORG_NAME]-[SPACE_NAME].hf.space/gradio/
|
| 588 |
|
| 589 |
+
*NOTE - for HF deployment we have to access the Gradio test UI in this way, because it is not possible to expose multiple ports on HF Spaces. For the other modules we directly expose Gradio, but with the Orchestrator we need to run FastAPI to support the LangServe endpoints.*
|
| 590 |
+
|
| 591 |
|
| 592 |
## Troubleshooting
|
| 593 |
|