LiamKhoaLe commited on
Commit
ab69c75
·
1 Parent(s): b32ca93

Upd MAC agentic strats

Browse files
Files changed (2) hide show
  1. README.md +36 -7
  2. app.py +415 -26
README.md CHANGED
@@ -59,9 +59,12 @@ tags:
59
  - **Text-to-Speech**: Voice output generation using Maya1 TTS model (optional, fallback to MCP if unavailable) plus a one-click "Play Response" control for the latest answer
60
 
61
  ### 🤝 **MAC Architecture (Multi-Agent Collaboration)**
62
- - **Gemini Supervisor**: Orchestrates query processing by breaking queries into 2-4 focused sub-topics (JSON format)
63
  - **MedSwin Specialist**: Executes tasks sequentially, providing concise clinical answers
 
 
64
  - **Search Mode**: Gemini creates 1-4 search strategies → executes ddgs searches (1-2 sources each) → summarizes briefly
 
65
  - **RAG Mode**: Gemini brainstorms retrieved documents into 1-4 short contexts for MedSwin decision-making
66
  - **Clean Output**: All internal thoughts/conversations are logged only; users see only the final answer
67
  - **Markdown Format**: Final answers use bullet points (tables automatically converted)
@@ -152,8 +155,11 @@ MedLLM Agent is designed to support **doctors, clinicians, and medical specialis
152
 
153
  #### 1. **MAC Architecture (Multi-Agent Collaboration)**
154
  - **Gemini Supervisor Agent**:
155
- - Breaks user queries into 2-4 focused sub-topics (JSON format)
 
 
156
  - In search mode: creates 1-4 search strategies, executes ddgs (1-2 sources each), summarizes briefly
 
157
  - In RAG mode: brainstorms retrieved documents into 1-4 concise contexts
158
  - All supervisor decisions logged internally, not displayed
159
 
@@ -196,9 +202,11 @@ MedLLM Agent is designed to support **doctors, clinicians, and medical specialis
196
 
197
  ### **How It Works: MAC Architecture in Action**
198
 
199
- 1. **Gemini Supervisor - Query Breakdown** → Analyzes query and breaks into 2-4 sub-topics (JSON):
200
  - Example: "What are the treatment options for Type 2 diabetes in elderly patients with renal impairment?"
201
- - Creates structured sub-topics: treatment options, age considerations, renal function impact
 
 
202
  - All planning logged internally, not displayed to user
203
 
204
  2. **Gemini Supervisor - Context Preparation**:
@@ -212,8 +220,26 @@ MedLLM Agent is designed to support **doctors, clinicians, and medical specialis
212
  - Generates concise clinical answers (Markdown bullets, no tables)
213
  - All execution logged internally
214
 
215
- 4. **Final Answer Assembly**:
216
- - Combines all MedSwin task answers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
217
  - Converts any tables to Markdown bullets
218
  - Adds citations if web sources used
219
  - Translates back if needed
@@ -225,7 +251,10 @@ MedLLM Agent is designed to support **doctors, clinicians, and medical specialis
225
  ✅ **Evidence-Based Decisions**: Grounds answers in uploaded documents and current medical literature
226
  ✅ **Reduced Hallucination**: RAG ensures answers are based on actual documents and verified sources
227
  ✅ **Comprehensive Coverage**: Combines institutional knowledge (documents) with current medical knowledge (web)
228
- ✅ **Quality Assurance**: Self-reflection ensures high-quality, complete answers
 
 
 
229
  ✅ **Scalability**: Handles multiple languages, complex queries, and large document libraries
230
  ✅ **Clinical Workflow Integration**: Designed to fit into existing clinical decision-making processes
231
  ✅ **MCP Protocol**: Standardized tool integration for reliable, maintainable web search capabilities
 
59
  - **Text-to-Speech**: Voice output generation using Maya1 TTS model (optional, fallback to MCP if unavailable) plus a one-click "Play Response" control for the latest answer
60
 
61
  ### 🤝 **MAC Architecture (Multi-Agent Collaboration)**
62
+ - **Gemini Supervisor**: Orchestrates query processing by breaking queries into flexible sub-topics (up to 10 based on complexity, explores different approaches)
63
  - **MedSwin Specialist**: Executes tasks sequentially, providing concise clinical answers
64
+ - **Enhanced Synthesis**: Supervisor synthesizes all MedSwin responses with clear context into comprehensive final answers
65
+ - **Iterative Improvement**: Supervisor challenges and enhances answers until confirmed optimal (up to 2 iterations)
66
  - **Search Mode**: Gemini creates 1-4 search strategies → executes ddgs searches (1-2 sources each) → summarizes briefly
67
+ - **Conditional Search Trigger**: When search mode enabled, supervisor can trigger additional searches if answer is unclear or has gaps
68
  - **RAG Mode**: Gemini brainstorms retrieved documents into 1-4 short contexts for MedSwin decision-making
69
  - **Clean Output**: All internal thoughts/conversations are logged only; users see only the final answer
70
  - **Markdown Format**: Final answers use bullet points (tables automatically converted)
 
155
 
156
  #### 1. **MAC Architecture (Multi-Agent Collaboration)**
157
  - **Gemini Supervisor Agent**:
158
+ - Breaks user queries into flexible sub-topics (up to 10 based on complexity, explores different approaches/angles)
159
+ - Synthesizes all MedSwin responses with clear context into comprehensive final answers
160
+ - Challenges and enhances answers iteratively until confirmed optimal (up to 2 iterations)
161
  - In search mode: creates 1-4 search strategies, executes ddgs (1-2 sources each), summarizes briefly
162
+ - Conditional search trigger: Can trigger additional searches if answer is unclear or has gaps (only when search mode enabled)
163
  - In RAG mode: brainstorms retrieved documents into 1-4 concise contexts
164
  - All supervisor decisions logged internally, not displayed
165
 
 
202
 
203
  ### **How It Works: MAC Architecture in Action**
204
 
205
+ 1. **Gemini Supervisor - Query Breakdown** → Analyzes query and breaks into flexible sub-topics (up to 10 based on complexity):
206
  - Example: "What are the treatment options for Type 2 diabetes in elderly patients with renal impairment?"
207
+ - Explores different approaches (clinical, diagnostic, treatment, prevention perspectives)
208
+ - Creates structured sub-topics: treatment options, age considerations, renal function impact, drug interactions, monitoring protocols
209
+ - Number of subtasks adapts to query complexity (not limited to 4)
210
  - All planning logged internally, not displayed to user
211
 
212
  2. **Gemini Supervisor - Context Preparation**:
 
220
  - Generates concise clinical answers (Markdown bullets, no tables)
221
  - All execution logged internally
222
 
223
+ 4. **Gemini Supervisor - Answer Synthesis**:
224
+ - Synthesizes all MedSwin responses with clear context
225
+ - Integrates information from all sub-topics seamlessly
226
+ - Creates coherent, comprehensive final answer
227
+ - Provides better context than simple concatenation
228
+
229
+ 5. **Gemini Supervisor - Challenge & Enhancement Loop**:
230
+ - Evaluates answer quality (completeness, accuracy, clarity)
231
+ - Challenges answer if not optimal
232
+ - Provides specific enhancement instructions
233
+ - Enhances answer iteratively (up to 2 iterations)
234
+ - Continues until answer confirmed optimal
235
+
236
+ 6. **Conditional Search Trigger** (only when search mode enabled):
237
+ - Supervisor checks if answer is unclear or has gaps
238
+ - If needed, generates specific search queries to fill gaps
239
+ - Executes additional searches
240
+ - Enhances answer with new search context
241
+
242
+ 7. **Final Answer Assembly**:
243
  - Converts any tables to Markdown bullets
244
  - Adds citations if web sources used
245
  - Translates back if needed
 
251
  ✅ **Evidence-Based Decisions**: Grounds answers in uploaded documents and current medical literature
252
  ✅ **Reduced Hallucination**: RAG ensures answers are based on actual documents and verified sources
253
  ✅ **Comprehensive Coverage**: Combines institutional knowledge (documents) with current medical knowledge (web)
254
+ ✅ **Enhanced Quality**: Iterative challenge loop ensures answers are optimal before delivery
255
+ ✅ **Flexible Task Breakdown**: Adapts to query complexity with flexible subtask generation (not limited to 4 steps)
256
+ ✅ **Intelligent Search**: Conditional search trigger fills gaps when answers are unclear
257
+ ✅ **Better Context**: Enhanced synthesis provides clearer, more comprehensive final answers
258
  ✅ **Scalability**: Handles multiple languages, complex queries, and large document libraries
259
  ✅ **Clinical Workflow Integration**: Designed to fit into existing clinical decision-making processes
260
  ✅ **MCP Protocol**: Standardized tool integration for reliable, maintainable web search capabilities
app.py CHANGED
@@ -1172,7 +1172,7 @@ def autonomous_execution_strategy(reasoning: dict, plan: dict, use_rag: bool, us
1172
 
1173
  async def gemini_supervisor_breakdown_async(query: str, use_rag: bool, use_web_search: bool, time_elapsed: float, max_duration: int = 120) -> dict:
1174
  """
1175
- Gemini Supervisor: Break user query into 2-4 sub-topics (JSON format)
1176
  This is the main supervisor function that orchestrates the MAC architecture.
1177
  All internal thoughts are logged, not displayed.
1178
  """
@@ -1186,12 +1186,20 @@ async def gemini_supervisor_breakdown_async(query: str, use_rag: bool, use_web_s
1186
  if not mode_description:
1187
  mode_description.append("Direct answer mode - no additional context")
1188
 
 
 
 
 
 
 
1189
  prompt = f"""You are a supervisor agent coordinating with a MedSwin medical specialist model.
1190
- Break the following medical query into 2-4 focused sub-topics that MedSwin can answer sequentially.
 
1191
 
1192
  Query: "{query}"
1193
  Mode: {', '.join(mode_description)}
1194
  Time Remaining: ~{remaining_time:.1f}s
 
1195
 
1196
  Return ONLY valid JSON (no markdown, no tables, no explanations):
1197
  {{
@@ -1201,17 +1209,23 @@ Return ONLY valid JSON (no markdown, no tables, no explanations):
1201
  "topic": "concise topic name",
1202
  "instruction": "specific directive for MedSwin to answer this topic",
1203
  "expected_tokens": 200,
1204
- "priority": "high|medium|low"
 
1205
  }},
1206
  ...
1207
  ],
1208
- "max_topics": 4,
1209
- "strategy": "brief strategy description"
1210
  }}
1211
 
1212
- Keep topics focused and actionable. Each topic should be answerable in ~200 tokens by MedSwin."""
 
 
 
 
 
1213
 
1214
- system_prompt = "You are a medical query supervisor. Break queries into structured JSON sub-topics. Return ONLY valid JSON."
1215
 
1216
  response = await call_agent(
1217
  user_prompt=prompt,
@@ -1236,11 +1250,11 @@ Keep topics focused and actionable. Each topic should be answerable in ~200 toke
1236
  # Fallback: simple breakdown
1237
  breakdown = {
1238
  "sub_topics": [
1239
- {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high"},
1240
- {"id": 2, "topic": "Clinical Details", "instruction": "Provide key clinical insights", "expected_tokens": 200, "priority": "medium"},
1241
  ],
1242
- "max_topics": 2,
1243
- "strategy": "Sequential answer with key points"
1244
  }
1245
  logger.warning(f"[GEMINI SUPERVISOR] Using fallback breakdown")
1246
  return breakdown
@@ -1367,11 +1381,11 @@ def gemini_supervisor_breakdown(query: str, use_rag: bool, use_web_search: bool,
1367
  logger.warning("[GEMINI SUPERVISOR] MCP unavailable, using fallback breakdown")
1368
  return {
1369
  "sub_topics": [
1370
- {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high"},
1371
- {"id": 2, "topic": "Clinical Details", "instruction": "Provide key clinical insights", "expected_tokens": 200, "priority": "medium"},
1372
  ],
1373
- "max_topics": 2,
1374
- "strategy": "Sequential answer with key points"
1375
  }
1376
 
1377
  try:
@@ -1392,10 +1406,10 @@ def gemini_supervisor_breakdown(query: str, use_rag: bool, use_web_search: bool,
1392
  logger.error(f"[GEMINI SUPERVISOR] Breakdown request failed: {exc}")
1393
  return {
1394
  "sub_topics": [
1395
- {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high"},
1396
  ],
1397
- "max_topics": 1,
1398
- "strategy": "Direct answer"
1399
  }
1400
 
1401
  def gemini_supervisor_search_strategies(query: str, time_elapsed: float) -> dict:
@@ -1541,6 +1555,311 @@ def execute_medswin_task(
1541
  logger.info(f"[MEDSWIN] Task completed: {len(response)} chars generated")
1542
  return response
1543
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1544
  async def self_reflection_gemini(answer: str, query: str) -> dict:
1545
  """Self-reflection using Gemini MCP"""
1546
  reflection_prompt = f"""Evaluate this medical answer for quality and completeness:
@@ -1879,10 +2198,10 @@ def stream_chat(
1879
  # Simple breakdown for direct mode
1880
  breakdown = {
1881
  "sub_topics": [
1882
- {"id": 1, "topic": "Answer", "instruction": message, "expected_tokens": 400, "priority": "high"}
1883
  ],
1884
- "max_topics": 1,
1885
- "strategy": "Direct answer"
1886
  }
1887
  else:
1888
  logger.info("[GEMINI SUPERVISOR] Breaking query into sub-topics...")
@@ -2017,10 +2336,16 @@ def stream_chat(
2017
  # Continue with next task
2018
  continue
2019
 
2020
- # ===== STEP 5: Combine all MedSwin answers into final answer =====
2021
- final_answer = "\n\n".join(medswin_answers) if medswin_answers else "I apologize, but I was unable to generate a response."
2022
- citations_text = ""
2023
-
 
 
 
 
 
 
2024
  # Clean final answer - ensure no tables, only Markdown bullets
2025
  if "|" in final_answer and "---" in final_answer:
2026
  logger.warning("[MEDSWIN] Final answer contains tables, converting to bullets")
@@ -2036,7 +2361,71 @@ def stream_chat(
2036
  cleaned_lines.append(line)
2037
  final_answer = '\n'.join(cleaned_lines)
2038
 
2039
- # ===== STEP 6: Finalize answer (translate, add citations, format) =====
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2040
  # Translate back if needed
2041
  if needs_translation and final_answer:
2042
  logger.info(f"[GEMINI SUPERVISOR] Translating response back to {original_lang}...")
 
1172
 
1173
  async def gemini_supervisor_breakdown_async(query: str, use_rag: bool, use_web_search: bool, time_elapsed: float, max_duration: int = 120) -> dict:
1174
  """
1175
+ Gemini Supervisor: Break user query into sub-topics (flexible number, explore different approaches)
1176
  This is the main supervisor function that orchestrates the MAC architecture.
1177
  All internal thoughts are logged, not displayed.
1178
  """
 
1186
  if not mode_description:
1187
  mode_description.append("Direct answer mode - no additional context")
1188
 
1189
+ # Calculate reasonable max topics based on time remaining
1190
+ # Allow more subtasks if we have time, but be flexible
1191
+ estimated_time_per_task = 8 # seconds per task
1192
+ max_topics_by_time = max(2, int((remaining_time - 20) / estimated_time_per_task))
1193
+ max_topics = min(max_topics_by_time, 10) # Cap at 10, but allow more than 4
1194
+
1195
  prompt = f"""You are a supervisor agent coordinating with a MedSwin medical specialist model.
1196
+ Break the following medical query into focused sub-topics that MedSwin can answer sequentially.
1197
+ Explore different potential approaches to comprehensively address the topic.
1198
 
1199
  Query: "{query}"
1200
  Mode: {', '.join(mode_description)}
1201
  Time Remaining: ~{remaining_time:.1f}s
1202
+ Maximum Topics: {max_topics} (adjust based on complexity - use as many as needed for thorough coverage)
1203
 
1204
  Return ONLY valid JSON (no markdown, no tables, no explanations):
1205
  {{
 
1209
  "topic": "concise topic name",
1210
  "instruction": "specific directive for MedSwin to answer this topic",
1211
  "expected_tokens": 200,
1212
+ "priority": "high|medium|low",
1213
+ "approach": "brief description of approach/angle for this topic"
1214
  }},
1215
  ...
1216
  ],
1217
+ "strategy": "brief strategy description explaining the breakdown approach",
1218
+ "exploration_note": "brief note on different approaches explored"
1219
  }}
1220
 
1221
+ Guidelines:
1222
+ - Break down the query into as many subtasks as needed for comprehensive coverage
1223
+ - Explore different angles/approaches (e.g., clinical, diagnostic, treatment, prevention, research perspectives)
1224
+ - Each topic should be focused and answerable in ~200 tokens by MedSwin
1225
+ - Prioritize topics by importance (high priority first)
1226
+ - Don't limit yourself to 4 topics - use more if the query is complex or multi-faceted"""
1227
 
1228
+ system_prompt = "You are a medical query supervisor. Break queries into structured JSON sub-topics, exploring different approaches. Return ONLY valid JSON."
1229
 
1230
  response = await call_agent(
1231
  user_prompt=prompt,
 
1250
  # Fallback: simple breakdown
1251
  breakdown = {
1252
  "sub_topics": [
1253
+ {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high", "approach": "direct answer"},
1254
+ {"id": 2, "topic": "Clinical Details", "instruction": "Provide key clinical insights", "expected_tokens": 200, "priority": "medium", "approach": "clinical perspective"},
1255
  ],
1256
+ "strategy": "Sequential answer with key points",
1257
+ "exploration_note": "Fallback breakdown - basic coverage"
1258
  }
1259
  logger.warning(f"[GEMINI SUPERVISOR] Using fallback breakdown")
1260
  return breakdown
 
1381
  logger.warning("[GEMINI SUPERVISOR] MCP unavailable, using fallback breakdown")
1382
  return {
1383
  "sub_topics": [
1384
+ {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high", "approach": "direct answer"},
1385
+ {"id": 2, "topic": "Clinical Details", "instruction": "Provide key clinical insights", "expected_tokens": 200, "priority": "medium", "approach": "clinical perspective"},
1386
  ],
1387
+ "strategy": "Sequential answer with key points",
1388
+ "exploration_note": "Fallback breakdown - basic coverage"
1389
  }
1390
 
1391
  try:
 
1406
  logger.error(f"[GEMINI SUPERVISOR] Breakdown request failed: {exc}")
1407
  return {
1408
  "sub_topics": [
1409
+ {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high", "approach": "direct answer"},
1410
  ],
1411
+ "strategy": "Direct answer",
1412
+ "exploration_note": "Fallback breakdown - single topic"
1413
  }
1414
 
1415
  def gemini_supervisor_search_strategies(query: str, time_elapsed: float) -> dict:
 
1555
  logger.info(f"[MEDSWIN] Task completed: {len(response)} chars generated")
1556
  return response
1557
 
1558
+ async def gemini_supervisor_synthesize_async(query: str, medswin_answers: list, rag_contexts: list, search_contexts: list, breakdown: dict) -> str:
1559
+ """
1560
+ Gemini Supervisor: Synthesize final answer from all MedSwin responses with clear context
1561
+ Provides better context to create a comprehensive, well-structured final answer
1562
+ """
1563
+ # Prepare context summary
1564
+ context_summary = ""
1565
+ if rag_contexts:
1566
+ context_summary += f"Document Context Available: {len(rag_contexts)} context(s) from uploaded documents.\n"
1567
+ if search_contexts:
1568
+ context_summary += f"Web Search Context Available: {len(search_contexts)} search result(s).\n"
1569
+
1570
+ # Combine all MedSwin answers
1571
+ all_answers_text = "\n\n---\n\n".join([f"## {i+1}. {ans}" for i, ans in enumerate(medswin_answers)])
1572
+
1573
+ prompt = f"""You are a supervisor agent synthesizing a comprehensive medical answer from multiple specialist responses.
1574
+
1575
+ Original Query: "{query}"
1576
+
1577
+ Context Available:
1578
+ {context_summary}
1579
+
1580
+ MedSwin Specialist Responses (from {len(medswin_answers)} sub-topics):
1581
+ {all_answers_text}
1582
+
1583
+ Your task:
1584
+ 1. Synthesize all responses into a coherent, comprehensive final answer
1585
+ 2. Integrate information from all sub-topics seamlessly
1586
+ 3. Ensure the answer directly addresses the original query
1587
+ 4. Maintain clinical accuracy and clarity
1588
+ 5. Use clear structure with appropriate headings and bullet points
1589
+ 6. Remove redundancy and contradictions
1590
+ 7. Ensure all important points from MedSwin responses are included
1591
+
1592
+ Return the final synthesized answer in Markdown format. Do not add meta-commentary or explanations - just provide the final answer."""
1593
+
1594
+ system_prompt = "You are a medical answer synthesis supervisor. Create comprehensive, well-structured final answers from multiple specialist responses."
1595
+
1596
+ result = await call_agent(
1597
+ user_prompt=prompt,
1598
+ system_prompt=system_prompt,
1599
+ model=GEMINI_MODEL,
1600
+ temperature=0.3
1601
+ )
1602
+
1603
+ return result.strip()
1604
+
1605
+ async def gemini_supervisor_challenge_async(query: str, current_answer: str, medswin_answers: list, rag_contexts: list, search_contexts: list) -> dict:
1606
+ """
1607
+ Gemini Supervisor: Challenge and evaluate the current answer, suggesting improvements
1608
+ Returns evaluation with suggestions for enhancement
1609
+ """
1610
+ context_info = ""
1611
+ if rag_contexts:
1612
+ context_info += f"Document contexts: {len(rag_contexts)} available.\n"
1613
+ if search_contexts:
1614
+ context_info += f"Search contexts: {len(search_contexts)} available.\n"
1615
+
1616
+ all_answers_text = "\n\n---\n\n".join([f"## {i+1}. {ans}" for i, ans in enumerate(medswin_answers)])
1617
+
1618
+ prompt = f"""You are a supervisor agent evaluating and challenging a medical answer for quality and completeness.
1619
+
1620
+ Original Query: "{query}"
1621
+
1622
+ Available Context:
1623
+ {context_info}
1624
+
1625
+ MedSwin Specialist Responses:
1626
+ {all_answers_text}
1627
+
1628
+ Current Synthesized Answer:
1629
+ {current_answer[:2000]}
1630
+
1631
+ Evaluate this answer and provide:
1632
+ 1. Completeness: Does it fully address the query? What's missing?
1633
+ 2. Accuracy: Are there any inaccuracies or contradictions?
1634
+ 3. Clarity: Is it well-structured and clear?
1635
+ 4. Context Usage: Are document/search contexts properly utilized?
1636
+ 5. Improvement Suggestions: Specific ways to enhance the answer
1637
+
1638
+ Return ONLY valid JSON:
1639
+ {{
1640
+ "is_optimal": true/false,
1641
+ "completeness_score": 0-10,
1642
+ "accuracy_score": 0-10,
1643
+ "clarity_score": 0-10,
1644
+ "missing_aspects": ["..."],
1645
+ "inaccuracies": ["..."],
1646
+ "improvement_suggestions": ["..."],
1647
+ "needs_more_context": true/false,
1648
+ "enhancement_instructions": "specific instructions for improving the answer"
1649
+ }}"""
1650
+
1651
+ system_prompt = "You are a medical answer quality evaluator. Provide honest, constructive feedback in JSON format. Return ONLY valid JSON."
1652
+
1653
+ response = await call_agent(
1654
+ user_prompt=prompt,
1655
+ system_prompt=system_prompt,
1656
+ model=GEMINI_MODEL,
1657
+ temperature=0.3
1658
+ )
1659
+
1660
+ try:
1661
+ json_start = response.find('{')
1662
+ json_end = response.rfind('}') + 1
1663
+ if json_start >= 0 and json_end > json_start:
1664
+ evaluation = json.loads(response[json_start:json_end])
1665
+ logger.info(f"[GEMINI SUPERVISOR] Challenge evaluation: optimal={evaluation.get('is_optimal', False)}, scores={evaluation.get('completeness_score', 'N/A')}/{evaluation.get('accuracy_score', 'N/A')}/{evaluation.get('clarity_score', 'N/A')}")
1666
+ return evaluation
1667
+ else:
1668
+ raise ValueError("Evaluation JSON not found")
1669
+ except Exception as exc:
1670
+ logger.error(f"[GEMINI SUPERVISOR] Challenge evaluation parsing failed: {exc}")
1671
+ return {
1672
+ "is_optimal": True,
1673
+ "completeness_score": 7,
1674
+ "accuracy_score": 7,
1675
+ "clarity_score": 7,
1676
+ "missing_aspects": [],
1677
+ "inaccuracies": [],
1678
+ "improvement_suggestions": [],
1679
+ "needs_more_context": False,
1680
+ "enhancement_instructions": ""
1681
+ }
1682
+
1683
+ async def gemini_supervisor_enhance_answer_async(query: str, current_answer: str, enhancement_instructions: str, medswin_answers: list, rag_contexts: list, search_contexts: list) -> str:
1684
+ """
1685
+ Gemini Supervisor: Enhance the answer based on challenge feedback
1686
+ """
1687
+ context_info = ""
1688
+ if rag_contexts:
1689
+ context_info += f"Document contexts: {len(rag_contexts)} available.\n"
1690
+ if search_contexts:
1691
+ context_info += f"Search contexts: {len(search_contexts)} available.\n"
1692
+
1693
+ all_answers_text = "\n\n---\n\n".join([f"## {i+1}. {ans}" for i, ans in enumerate(medswin_answers)])
1694
+
1695
+ prompt = f"""You are a supervisor agent enhancing a medical answer based on evaluation feedback.
1696
+
1697
+ Original Query: "{query}"
1698
+
1699
+ Available Context:
1700
+ {context_info}
1701
+
1702
+ MedSwin Specialist Responses:
1703
+ {all_answers_text}
1704
+
1705
+ Current Answer (to enhance):
1706
+ {current_answer}
1707
+
1708
+ Enhancement Instructions:
1709
+ {enhancement_instructions}
1710
+
1711
+ Create an enhanced version of the answer that:
1712
+ 1. Addresses all improvement suggestions
1713
+ 2. Fills in missing aspects
1714
+ 3. Corrects any inaccuracies
1715
+ 4. Improves clarity and structure
1716
+ 5. Better utilizes available context
1717
+ 6. Maintains all valuable information from the current answer
1718
+
1719
+ Return the enhanced answer in Markdown format. Do not add meta-commentary."""
1720
+
1721
+ system_prompt = "You are a medical answer enhancement supervisor. Improve answers based on evaluation feedback while maintaining accuracy."
1722
+
1723
+ result = await call_agent(
1724
+ user_prompt=prompt,
1725
+ system_prompt=system_prompt,
1726
+ model=GEMINI_MODEL,
1727
+ temperature=0.3
1728
+ )
1729
+
1730
+ return result.strip()
1731
+
1732
+ async def gemini_supervisor_check_clarity_async(query: str, answer: str, use_web_search: bool) -> dict:
1733
+ """
1734
+ Gemini Supervisor: Check if answer is unclear or supervisor is unsure (only when search mode enabled)
1735
+ Returns decision on whether to trigger additional search
1736
+ """
1737
+ if not use_web_search:
1738
+ # Only check clarity when search mode is enabled
1739
+ return {"is_unclear": False, "needs_search": False, "search_queries": []}
1740
+
1741
+ prompt = f"""You are a supervisor agent evaluating answer clarity and completeness.
1742
+
1743
+ Query: "{query}"
1744
+
1745
+ Current Answer:
1746
+ {answer[:1500]}
1747
+
1748
+ Evaluate:
1749
+ 1. Is the answer unclear or incomplete?
1750
+ 2. Are there gaps that web search could fill?
1751
+ 3. Is the supervisor (you) unsure about certain aspects?
1752
+
1753
+ Return ONLY valid JSON:
1754
+ {{
1755
+ "is_unclear": true/false,
1756
+ "needs_search": true/false,
1757
+ "uncertainty_areas": ["..."],
1758
+ "search_queries": ["specific search queries to fill gaps"],
1759
+ "rationale": "brief explanation"
1760
+ }}
1761
+
1762
+ Only suggest search if the answer is genuinely unclear or has significant gaps that search could address."""
1763
+
1764
+ system_prompt = "You are a clarity evaluator. Assess if additional web search is needed. Return ONLY valid JSON."
1765
+
1766
+ response = await call_agent(
1767
+ user_prompt=prompt,
1768
+ system_prompt=system_prompt,
1769
+ model=GEMINI_MODEL_LITE,
1770
+ temperature=0.2
1771
+ )
1772
+
1773
+ try:
1774
+ json_start = response.find('{')
1775
+ json_end = response.rfind('}') + 1
1776
+ if json_start >= 0 and json_end > json_start:
1777
+ evaluation = json.loads(response[json_start:json_end])
1778
+ logger.info(f"[GEMINI SUPERVISOR] Clarity check: unclear={evaluation.get('is_unclear', False)}, needs_search={evaluation.get('needs_search', False)}")
1779
+ return evaluation
1780
+ else:
1781
+ raise ValueError("Clarity check JSON not found")
1782
+ except Exception as exc:
1783
+ logger.error(f"[GEMINI SUPERVISOR] Clarity check parsing failed: {exc}")
1784
+ return {"is_unclear": False, "needs_search": False, "search_queries": []}
1785
+
1786
+ def gemini_supervisor_synthesize(query: str, medswin_answers: list, rag_contexts: list, search_contexts: list, breakdown: dict) -> str:
1787
+ """Wrapper to synthesize answer synchronously"""
1788
+ if not MCP_AVAILABLE:
1789
+ logger.warning("[GEMINI SUPERVISOR] MCP unavailable for synthesis, using simple concatenation")
1790
+ return "\n\n".join(medswin_answers)
1791
+
1792
+ try:
1793
+ loop = asyncio.get_event_loop()
1794
+ if loop.is_running():
1795
+ try:
1796
+ import nest_asyncio
1797
+ return nest_asyncio.run(gemini_supervisor_synthesize_async(query, medswin_answers, rag_contexts, search_contexts, breakdown))
1798
+ except Exception as exc:
1799
+ logger.error(f"[GEMINI SUPERVISOR] Nested synthesis failed: {exc}")
1800
+ return "\n\n".join(medswin_answers)
1801
+ return loop.run_until_complete(gemini_supervisor_synthesize_async(query, medswin_answers, rag_contexts, search_contexts, breakdown))
1802
+ except Exception as exc:
1803
+ logger.error(f"[GEMINI SUPERVISOR] Synthesis failed: {exc}")
1804
+ return "\n\n".join(medswin_answers)
1805
+
1806
+ def gemini_supervisor_challenge(query: str, current_answer: str, medswin_answers: list, rag_contexts: list, search_contexts: list) -> dict:
1807
+ """Wrapper to challenge answer synchronously"""
1808
+ if not MCP_AVAILABLE:
1809
+ return {"is_optimal": True, "completeness_score": 7, "accuracy_score": 7, "clarity_score": 7, "missing_aspects": [], "inaccuracies": [], "improvement_suggestions": [], "needs_more_context": False, "enhancement_instructions": ""}
1810
+
1811
+ try:
1812
+ loop = asyncio.get_event_loop()
1813
+ if loop.is_running():
1814
+ try:
1815
+ import nest_asyncio
1816
+ return nest_asyncio.run(gemini_supervisor_challenge_async(query, current_answer, medswin_answers, rag_contexts, search_contexts))
1817
+ except Exception as exc:
1818
+ logger.error(f"[GEMINI SUPERVISOR] Nested challenge failed: {exc}")
1819
+ return {"is_optimal": True, "completeness_score": 7, "accuracy_score": 7, "clarity_score": 7, "missing_aspects": [], "inaccuracies": [], "improvement_suggestions": [], "needs_more_context": False, "enhancement_instructions": ""}
1820
+ return loop.run_until_complete(gemini_supervisor_challenge_async(query, current_answer, medswin_answers, rag_contexts, search_contexts))
1821
+ except Exception as exc:
1822
+ logger.error(f"[GEMINI SUPERVISOR] Challenge failed: {exc}")
1823
+ return {"is_optimal": True, "completeness_score": 7, "accuracy_score": 7, "clarity_score": 7, "missing_aspects": [], "inaccuracies": [], "improvement_suggestions": [], "needs_more_context": False, "enhancement_instructions": ""}
1824
+
1825
+ def gemini_supervisor_enhance_answer(query: str, current_answer: str, enhancement_instructions: str, medswin_answers: list, rag_contexts: list, search_contexts: list) -> str:
1826
+ """Wrapper to enhance answer synchronously"""
1827
+ if not MCP_AVAILABLE:
1828
+ return current_answer
1829
+
1830
+ try:
1831
+ loop = asyncio.get_event_loop()
1832
+ if loop.is_running():
1833
+ try:
1834
+ import nest_asyncio
1835
+ return nest_asyncio.run(gemini_supervisor_enhance_answer_async(query, current_answer, enhancement_instructions, medswin_answers, rag_contexts, search_contexts))
1836
+ except Exception as exc:
1837
+ logger.error(f"[GEMINI SUPERVISOR] Nested enhancement failed: {exc}")
1838
+ return current_answer
1839
+ return loop.run_until_complete(gemini_supervisor_enhance_answer_async(query, current_answer, enhancement_instructions, medswin_answers, rag_contexts, search_contexts))
1840
+ except Exception as exc:
1841
+ logger.error(f"[GEMINI SUPERVISOR] Enhancement failed: {exc}")
1842
+ return current_answer
1843
+
1844
+ def gemini_supervisor_check_clarity(query: str, answer: str, use_web_search: bool) -> dict:
1845
+ """Wrapper to check clarity synchronously"""
1846
+ if not MCP_AVAILABLE or not use_web_search:
1847
+ return {"is_unclear": False, "needs_search": False, "search_queries": []}
1848
+
1849
+ try:
1850
+ loop = asyncio.get_event_loop()
1851
+ if loop.is_running():
1852
+ try:
1853
+ import nest_asyncio
1854
+ return nest_asyncio.run(gemini_supervisor_check_clarity_async(query, answer, use_web_search))
1855
+ except Exception as exc:
1856
+ logger.error(f"[GEMINI SUPERVISOR] Nested clarity check failed: {exc}")
1857
+ return {"is_unclear": False, "needs_search": False, "search_queries": []}
1858
+ return loop.run_until_complete(gemini_supervisor_check_clarity_async(query, answer, use_web_search))
1859
+ except Exception as exc:
1860
+ logger.error(f"[GEMINI SUPERVISOR] Clarity check failed: {exc}")
1861
+ return {"is_unclear": False, "needs_search": False, "search_queries": []}
1862
+
1863
  async def self_reflection_gemini(answer: str, query: str) -> dict:
1864
  """Self-reflection using Gemini MCP"""
1865
  reflection_prompt = f"""Evaluate this medical answer for quality and completeness:
 
2198
  # Simple breakdown for direct mode
2199
  breakdown = {
2200
  "sub_topics": [
2201
+ {"id": 1, "topic": "Answer", "instruction": message, "expected_tokens": 400, "priority": "high", "approach": "direct answer"}
2202
  ],
2203
+ "strategy": "Direct answer",
2204
+ "exploration_note": "Direct mode - no breakdown"
2205
  }
2206
  else:
2207
  logger.info("[GEMINI SUPERVISOR] Breaking query into sub-topics...")
 
2336
  # Continue with next task
2337
  continue
2338
 
2339
+ # ===== STEP 5: GEMINI SUPERVISOR - Synthesize final answer with clear context =====
2340
+ logger.info("[GEMINI SUPERVISOR] Synthesizing final answer from all MedSwin responses...")
2341
+ raw_medswin_answers = [ans.split('\n\n', 1)[1] if '\n\n' in ans else ans for ans in medswin_answers] # Remove headers for synthesis
2342
+ final_answer = gemini_supervisor_synthesize(message, raw_medswin_answers, rag_contexts, search_contexts, breakdown)
2343
+
2344
+ if not final_answer or len(final_answer.strip()) < 50:
2345
+ # Fallback to simple concatenation if synthesis fails
2346
+ logger.warning("[GEMINI SUPERVISOR] Synthesis failed or too short, using concatenation")
2347
+ final_answer = "\n\n".join(medswin_answers) if medswin_answers else "I apologize, but I was unable to generate a response."
2348
+
2349
  # Clean final answer - ensure no tables, only Markdown bullets
2350
  if "|" in final_answer and "---" in final_answer:
2351
  logger.warning("[MEDSWIN] Final answer contains tables, converting to bullets")
 
2361
  cleaned_lines.append(line)
2362
  final_answer = '\n'.join(cleaned_lines)
2363
 
2364
+ # ===== STEP 6: GEMINI SUPERVISOR - Challenge and enhance answer iteratively =====
2365
+ max_challenge_iterations = 2 # Limit iterations to avoid timeout
2366
+ challenge_iteration = 0
2367
+
2368
+ while challenge_iteration < max_challenge_iterations and elapsed() < soft_timeout - 15:
2369
+ challenge_iteration += 1
2370
+ logger.info(f"[GEMINI SUPERVISOR] Challenge iteration {challenge_iteration}/{max_challenge_iterations}...")
2371
+
2372
+ evaluation = gemini_supervisor_challenge(message, final_answer, raw_medswin_answers, rag_contexts, search_contexts)
2373
+
2374
+ if evaluation.get("is_optimal", False):
2375
+ logger.info(f"[GEMINI SUPERVISOR] Answer confirmed optimal after {challenge_iteration} iteration(s)")
2376
+ break
2377
+
2378
+ enhancement_instructions = evaluation.get("enhancement_instructions", "")
2379
+ if not enhancement_instructions:
2380
+ logger.info("[GEMINI SUPERVISOR] No enhancement instructions, considering answer optimal")
2381
+ break
2382
+
2383
+ logger.info(f"[GEMINI SUPERVISOR] Enhancing answer based on feedback...")
2384
+ enhanced_answer = gemini_supervisor_enhance_answer(
2385
+ message, final_answer, enhancement_instructions, raw_medswin_answers, rag_contexts, search_contexts
2386
+ )
2387
+
2388
+ if enhanced_answer and len(enhanced_answer.strip()) > len(final_answer.strip()) * 0.8: # Ensure enhancement is substantial
2389
+ final_answer = enhanced_answer
2390
+ logger.info(f"[GEMINI SUPERVISOR] Answer enhanced (new length: {len(final_answer)} chars)")
2391
+ else:
2392
+ logger.info("[GEMINI SUPERVISOR] Enhancement did not improve answer significantly, stopping")
2393
+ break
2394
+
2395
+ # ===== STEP 7: Conditional search trigger (only when search mode enabled) =====
2396
+ if final_use_web_search and elapsed() < soft_timeout - 10:
2397
+ logger.info("[GEMINI SUPERVISOR] Checking if additional search is needed...")
2398
+ clarity_check = gemini_supervisor_check_clarity(message, final_answer, final_use_web_search)
2399
+
2400
+ if clarity_check.get("needs_search", False) and clarity_check.get("search_queries"):
2401
+ logger.info(f"[GEMINI SUPERVISOR] Triggering additional search: {clarity_check.get('search_queries', [])}")
2402
+ additional_search_results = []
2403
+ for search_query in clarity_check.get("search_queries", [])[:3]: # Limit to 3 additional searches
2404
+ if elapsed() >= soft_timeout - 5:
2405
+ break
2406
+ results = search_web(search_query, max_results=2)
2407
+ additional_search_results.extend(results)
2408
+ web_urls.extend([r.get('url', '') for r in results if r.get('url')])
2409
+
2410
+ if additional_search_results:
2411
+ logger.info(f"[GEMINI SUPERVISOR] Summarizing {len(additional_search_results)} additional search results...")
2412
+ additional_summary = summarize_web_content(additional_search_results, message)
2413
+ if additional_summary:
2414
+ # Enhance answer with additional search context
2415
+ search_contexts.append(additional_summary)
2416
+ logger.info("[GEMINI SUPERVISOR] Enhancing answer with additional search context...")
2417
+ enhanced_with_search = gemini_supervisor_enhance_answer(
2418
+ message, final_answer,
2419
+ f"Incorporate the following additional information from web search: {additional_summary}",
2420
+ raw_medswin_answers, rag_contexts, search_contexts
2421
+ )
2422
+ if enhanced_with_search and len(enhanced_with_search.strip()) > 50:
2423
+ final_answer = enhanced_with_search
2424
+ logger.info("[GEMINI SUPERVISOR] Answer enhanced with additional search context")
2425
+
2426
+ citations_text = ""
2427
+
2428
+ # ===== STEP 8: Finalize answer (translate, add citations, format) =====
2429
  # Translate back if needed
2430
  if needs_translation and final_answer:
2431
  logger.info(f"[GEMINI SUPERVISOR] Translating response back to {original_lang}...")