mtyrrell commited on
Commit
24b5c2c
Β·
1 Parent(s): ef736ff

updated README

Browse files
Files changed (1) hide show
  1. README.md +109 -8
README.md CHANGED
@@ -87,20 +87,119 @@ All communication between modules happens over HTTP:
87
  - **Orchestrator ↔ Generator**: HTTP streaming (SSE for real-time responses)
88
  - **ChatUI ↔ Orchestrator**: LangServe streaming endpoints
89
 
90
- ### Workflow Logic
91
 
92
- The orchestrator implements two distinct workflows:
93
 
94
- **Direct Output Workflow** (when `DIRECT_OUTPUT=True` and file is new):
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ```
96
- File Upload β†’ Detect Type β†’ Ingest β†’ Direct Output β†’ Return Results
 
 
97
  ```
98
 
99
- **Standard RAG Workflow** (default or cached files):
 
 
 
 
 
 
 
 
100
  ```
101
- Query β†’ Retrieve Context β†’ Generate Response β†’ Stream to User
 
 
 
 
 
 
 
102
  ```
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ## Components
105
 
106
  ### 1. Main Application (`main.py`)
@@ -207,7 +306,7 @@ MAX_CONTEXT_CHARS = 15000
207
  Create a `.env` file with:
208
 
209
  ```bash
210
- # Required for private HuggingFace Spaces
211
  HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
212
 
213
  ```
@@ -267,7 +366,7 @@ LLM_SUMMARIZATION=false
267
 
268
  Key things to ensure here:
269
  - multimodalAcceptedMimetypes: file types to accept for upload via ChatUI
270
- - endpoints: orchestrator url + endpoints
271
 
272
  ## Deployment Guide
273
 
@@ -487,6 +586,8 @@ Clears the direct output file cache.
487
 
488
  Gradio's default API endpoint for UI interactions. If running on huggingface spaces, access via: https://[ORG_NAME]-[SPACE_NAME].hf.space/gradio/
489
 
 
 
490
 
491
  ## Troubleshooting
492
 
 
87
  - **Orchestrator ↔ Generator**: HTTP streaming (SSE for real-time responses)
88
  - **ChatUI ↔ Orchestrator**: LangServe streaming endpoints
89
 
90
+ ## Workflow Logic and File Processing
91
 
92
+ The orchestrator implements a dual-mode workflow designed to handle non-standard ingesion operations (e.g., Whisp GeoJSON API calls - as these need to be returned directly without going through the generator) while maintaining conversational context across multiple turns. This also addresses an issue with ChatUI in that it resends uploaded files on each turn in the conversation (e.g. follow-up queries).
93
 
94
+ ### Processing Modes Overview
95
+
96
+ #### Mode 1: Direct Output (DIRECT_OUTPUT = True)
97
+
98
+ **Purpose:** Immediately return long-running ingestor results to the user without LLM processing, then use those results as context for follow-up questions.
99
+
100
+ **First File Upload:**
101
+ ```
102
+ File Upload β†’ Detect Type β†’ Direct Output Ingest β†’ Return Raw Results
103
+ ↓
104
+ Cache Result (by file hash)
105
+ ```
106
+
107
+ **Subsequent Conversation Turns:**
108
  ```
109
+ Follow-up Query β†’ Detect Cached File β†’ Retrieved Context β†’ Combined Context β†’ Generator
110
+ ↓
111
+ Use Cached Ingestor Output as Context
112
  ```
113
 
114
+ **Key Behaviors:**
115
+ - First upload returns raw ingestor output immediately (no LLM generation)
116
+ - File content is hashed (SHA256) for deduplication
117
+ - Ingestor results are cached with the file hash
118
+ - All follow-up queries in the conversation use the cached ingestor output as retrieval context
119
+
120
+ **Noteable Unintuitive Behavior:** Once the file is cached on the Orchestrator, re-uploading the same file (even with different filename in a different chat) skips re-processing.
121
+
122
+ **Example Conversation Flow:**
123
  ```
124
+ User: [Uploads plot_boundaries.geojson]
125
+ System: [Returns API analysis results directly - no LLM processing]
126
+
127
+ User: "What deforestation risks were identified?"
128
+ System: [Uses cached GeoJSON results + retrieval β†’ LLM generation]
129
+
130
+ User: "How does this compare to EUDR requirements?"
131
+ System: [Uses same cached results + conversation history + retrieval β†’ LLM generation]
132
  ```
133
 
134
+ #### Mode 2: Standard RAG (DIRECT_OUTPUT = False)
135
+
136
+ **Purpose:** Traditional RAG pipeline where uploaded files are treated as additional context for generation from first instance.
137
+
138
+ **Every Query (with or without file):**
139
+ ```
140
+ Query + Optional File β†’ Detect Type β†’ Ingest β†’ Retrieved Context β†’ Combined Context β†’ Generator
141
+ ↓
142
+ Add to Context
143
+ ```
144
+
145
+ **Key Behaviors:**
146
+ - Files are processed through ingestor when uploaded
147
+ - Ingestor output is added to the retrieval context (not returned directly)
148
+ - Generator always processes the combined context (ingestor + retriever)
149
+ - No special caching or deduplication logic
150
+
151
+ **Example Conversation Flow:**
152
+ ```
153
+ User: "What are EUDR requirements?" + policy_document.pdf
154
+ System: [PDF β†’ Retrieval β†’ Combined Context β†’ Generator]
155
+
156
+ User: "Summarize section 3"
157
+ System: [Retrieval β†’ Combined Context β†’ Generator]
158
+ ```
159
+
160
+ ### File Hash Caching Mechanism
161
+
162
+ The orchestrator uses SHA256 hashing to detect duplicate file uploads:
163
+
164
+ **Cache Structure:**
165
+ ```python
166
+ {
167
+ "a3f5c91...": {
168
+ "ingestor_context": "API results...",
169
+ "timestamp": "2025-10-02T14:30:00",
170
+ "filename": "boundaries.geojson",
171
+ "file_type": "geojson"
172
+ }
173
+ }
174
+ ```
175
+
176
+ **Detection Logic:**
177
+ 1. File is uploaded
178
+ 2. Compute SHA256 hash of file content
179
+ 3. Check if hash exists in cache
180
+ 4. If not found: Process through ingestor, cache results
181
+ 5. If found: Use cached results (Skip Ingestion β†’ Retrieved Context β†’ Combined Context β†’ Generator)
182
+
183
+
184
+ ### Conversation Context Management
185
+
186
+ The system maintains conversation history separately from file processing with a simple management approach:
187
+
188
+ **Context Building (`build_conversation_context()`):**
189
+ - Always includes first user/assistant exchange
190
+ - Includes last N complete turns (default: 3)
191
+ - Respects character limits (default: 12,000 chars)
192
+
193
+ **Retrieval Strategy:**
194
+ - Uses **only the latest user query** for semantic search
195
+ - Does NOT send entire conversation history to retriever
196
+ - Ensures relevant document retrieval based on current question
197
+
198
+ **Generation Context:**
199
+ - Combines: Conversation history + Retrieved context + Cached file results
200
+ - Generator uses full context to produce coherent, contextually-aware responses
201
+
202
+
203
  ## Components
204
 
205
  ### 1. Main Application (`main.py`)
 
306
  Create a `.env` file with:
307
 
308
  ```bash
309
+ # Required for accessing private HuggingFace Spaces modules
310
  HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
311
 
312
  ```
 
366
 
367
  Key things to ensure here:
368
  - multimodalAcceptedMimetypes: file types to accept for upload via ChatUI
369
+ - endpoints: orchestrator url + endpoints (note these are the HF API urls, not the HF UI urls)
370
 
371
  ## Deployment Guide
372
 
 
586
 
587
  Gradio's default API endpoint for UI interactions. If running on huggingface spaces, access via: https://[ORG_NAME]-[SPACE_NAME].hf.space/gradio/
588
 
589
+ *NOTE - for HF deployment we have to access the Gradio test UI in this way, because it is not possible to expose multiple ports on HF Spaces. For the other modules we directly expose Gradio, but with the Orchestrator we need to run FastAPI to support the LangServe endpoints.*
590
+
591
 
592
  ## Troubleshooting
593