Spaces:

MCP-1st-Birthday
/

MedLLM-Agent

Running on Zero

App Files Files Community

LiamKhoaLe commited on about 1 month ago

Commit

ab69c75

1 Parent(s): b32ca93

Upd MAC agentic strats

Browse files

Files changed (2) hide show

README.md +36 -7
app.py +415 -26

README.md CHANGED Viewed

@@ -59,9 +59,12 @@ tags:
 - **Text-to-Speech**: Voice output generation using Maya1 TTS model (optional, fallback to MCP if unavailable) plus a one-click "Play Response" control for the latest answer
 ### 🤝 **MAC Architecture (Multi-Agent Collaboration)**
-- **Gemini Supervisor**: Orchestrates query processing by breaking queries into 2-4 focused sub-topics (JSON format)
 - **MedSwin Specialist**: Executes tasks sequentially, providing concise clinical answers
 - **Search Mode**: Gemini creates 1-4 search strategies → executes ddgs searches (1-2 sources each) → summarizes briefly
 - **RAG Mode**: Gemini brainstorms retrieved documents into 1-4 short contexts for MedSwin decision-making
 - **Clean Output**: All internal thoughts/conversations are logged only; users see only the final answer
 - **Markdown Format**: Final answers use bullet points (tables automatically converted)
@@ -152,8 +155,11 @@ MedLLM Agent is designed to support **doctors, clinicians, and medical specialis
 #### 1. **MAC Architecture (Multi-Agent Collaboration)**
 - **Gemini Supervisor Agent**:
-  - Breaks user queries into 2-4 focused sub-topics (JSON format)
   - In search mode: creates 1-4 search strategies, executes ddgs (1-2 sources each), summarizes briefly
   - In RAG mode: brainstorms retrieved documents into 1-4 concise contexts
   - All supervisor decisions logged internally, not displayed
@@ -196,9 +202,11 @@ MedLLM Agent is designed to support **doctors, clinicians, and medical specialis
 ### **How It Works: MAC Architecture in Action**
-1. **Gemini Supervisor - Query Breakdown** → Analyzes query and breaks into 2-4 sub-topics (JSON):
    - Example: "What are the treatment options for Type 2 diabetes in elderly patients with renal impairment?"
-   - Creates structured sub-topics: treatment options, age considerations, renal function impact
    - All planning logged internally, not displayed to user
 2. **Gemini Supervisor - Context Preparation**:
@@ -212,8 +220,26 @@ MedLLM Agent is designed to support **doctors, clinicians, and medical specialis
    - Generates concise clinical answers (Markdown bullets, no tables)
    - All execution logged internally
-4. **Final Answer Assembly**:
-   - Combines all MedSwin task answers
    - Converts any tables to Markdown bullets
    - Adds citations if web sources used
    - Translates back if needed
@@ -225,7 +251,10 @@ MedLLM Agent is designed to support **doctors, clinicians, and medical specialis
 ✅ **Evidence-Based Decisions**: Grounds answers in uploaded documents and current medical literature
 ✅ **Reduced Hallucination**: RAG ensures answers are based on actual documents and verified sources
 ✅ **Comprehensive Coverage**: Combines institutional knowledge (documents) with current medical knowledge (web)
-✅ **Quality Assurance**: Self-reflection ensures high-quality, complete answers
 ✅ **Scalability**: Handles multiple languages, complex queries, and large document libraries
 ✅ **Clinical Workflow Integration**: Designed to fit into existing clinical decision-making processes
 ✅ **MCP Protocol**: Standardized tool integration for reliable, maintainable web search capabilities

 - **Text-to-Speech**: Voice output generation using Maya1 TTS model (optional, fallback to MCP if unavailable) plus a one-click "Play Response" control for the latest answer
 ### 🤝 **MAC Architecture (Multi-Agent Collaboration)**
+- **Gemini Supervisor**: Orchestrates query processing by breaking queries into flexible sub-topics (up to 10 based on complexity, explores different approaches)
 - **MedSwin Specialist**: Executes tasks sequentially, providing concise clinical answers
+- **Enhanced Synthesis**: Supervisor synthesizes all MedSwin responses with clear context into comprehensive final answers
+- **Iterative Improvement**: Supervisor challenges and enhances answers until confirmed optimal (up to 2 iterations)
 - **Search Mode**: Gemini creates 1-4 search strategies → executes ddgs searches (1-2 sources each) → summarizes briefly
+- **Conditional Search Trigger**: When search mode enabled, supervisor can trigger additional searches if answer is unclear or has gaps
 - **RAG Mode**: Gemini brainstorms retrieved documents into 1-4 short contexts for MedSwin decision-making
 - **Clean Output**: All internal thoughts/conversations are logged only; users see only the final answer
 - **Markdown Format**: Final answers use bullet points (tables automatically converted)
 #### 1. **MAC Architecture (Multi-Agent Collaboration)**
 - **Gemini Supervisor Agent**:
+  - Breaks user queries into flexible sub-topics (up to 10 based on complexity, explores different approaches/angles)
+  - Synthesizes all MedSwin responses with clear context into comprehensive final answers
+  - Challenges and enhances answers iteratively until confirmed optimal (up to 2 iterations)
   - In search mode: creates 1-4 search strategies, executes ddgs (1-2 sources each), summarizes briefly
+  - Conditional search trigger: Can trigger additional searches if answer is unclear or has gaps (only when search mode enabled)
   - In RAG mode: brainstorms retrieved documents into 1-4 concise contexts
   - All supervisor decisions logged internally, not displayed
 ### **How It Works: MAC Architecture in Action**
+1. **Gemini Supervisor - Query Breakdown** → Analyzes query and breaks into flexible sub-topics (up to 10 based on complexity):
    - Example: "What are the treatment options for Type 2 diabetes in elderly patients with renal impairment?"
+   - Explores different approaches (clinical, diagnostic, treatment, prevention perspectives)
+   - Creates structured sub-topics: treatment options, age considerations, renal function impact, drug interactions, monitoring protocols
+   - Number of subtasks adapts to query complexity (not limited to 4)
    - All planning logged internally, not displayed to user
 2. **Gemini Supervisor - Context Preparation**:
    - Generates concise clinical answers (Markdown bullets, no tables)
    - All execution logged internally
+4. **Gemini Supervisor - Answer Synthesis**:
+   - Synthesizes all MedSwin responses with clear context
+   - Integrates information from all sub-topics seamlessly
+   - Creates coherent, comprehensive final answer
+   - Provides better context than simple concatenation
+5. **Gemini Supervisor - Challenge & Enhancement Loop**:
+   - Evaluates answer quality (completeness, accuracy, clarity)
+   - Challenges answer if not optimal
+   - Provides specific enhancement instructions
+   - Enhances answer iteratively (up to 2 iterations)
+   - Continues until answer confirmed optimal
+6. **Conditional Search Trigger** (only when search mode enabled):
+   - Supervisor checks if answer is unclear or has gaps
+   - If needed, generates specific search queries to fill gaps
+   - Executes additional searches
+   - Enhances answer with new search context
+7. **Final Answer Assembly**:
    - Converts any tables to Markdown bullets
    - Adds citations if web sources used
    - Translates back if needed
 ✅ **Evidence-Based Decisions**: Grounds answers in uploaded documents and current medical literature
 ✅ **Reduced Hallucination**: RAG ensures answers are based on actual documents and verified sources
 ✅ **Comprehensive Coverage**: Combines institutional knowledge (documents) with current medical knowledge (web)
+✅ **Enhanced Quality**: Iterative challenge loop ensures answers are optimal before delivery
+✅ **Flexible Task Breakdown**: Adapts to query complexity with flexible subtask generation (not limited to 4 steps)
+✅ **Intelligent Search**: Conditional search trigger fills gaps when answers are unclear
+✅ **Better Context**: Enhanced synthesis provides clearer, more comprehensive final answers
 ✅ **Scalability**: Handles multiple languages, complex queries, and large document libraries
 ✅ **Clinical Workflow Integration**: Designed to fit into existing clinical decision-making processes
 ✅ **MCP Protocol**: Standardized tool integration for reliable, maintainable web search capabilities

app.py CHANGED Viewed

@@ -1172,7 +1172,7 @@ def autonomous_execution_strategy(reasoning: dict, plan: dict, use_rag: bool, us
 async def gemini_supervisor_breakdown_async(query: str, use_rag: bool, use_web_search: bool, time_elapsed: float, max_duration: int = 120) -> dict:
     """
-    Gemini Supervisor: Break user query into 2-4 sub-topics (JSON format)
     This is the main supervisor function that orchestrates the MAC architecture.
     All internal thoughts are logged, not displayed.
     """
@@ -1186,12 +1186,20 @@ async def gemini_supervisor_breakdown_async(query: str, use_rag: bool, use_web_s
     if not mode_description:
         mode_description.append("Direct answer mode - no additional context")
     prompt = f"""You are a supervisor agent coordinating with a MedSwin medical specialist model.
-Break the following medical query into 2-4 focused sub-topics that MedSwin can answer sequentially.
 Query: "{query}"
 Mode: {', '.join(mode_description)}
 Time Remaining: ~{remaining_time:.1f}s
 Return ONLY valid JSON (no markdown, no tables, no explanations):
 {{
@@ -1201,17 +1209,23 @@ Return ONLY valid JSON (no markdown, no tables, no explanations):
       "topic": "concise topic name",
       "instruction": "specific directive for MedSwin to answer this topic",
       "expected_tokens": 200,
-      "priority": "high|medium|low"
     }},
     ...
   ],
-  "max_topics": 4,
-  "strategy": "brief strategy description"
 }}
-Keep topics focused and actionable. Each topic should be answerable in ~200 tokens by MedSwin."""
-    system_prompt = "You are a medical query supervisor. Break queries into structured JSON sub-topics. Return ONLY valid JSON."
     response = await call_agent(
         user_prompt=prompt,
@@ -1236,11 +1250,11 @@ Keep topics focused and actionable. Each topic should be answerable in ~200 toke
         # Fallback: simple breakdown
         breakdown = {
             "sub_topics": [
-                {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high"},
-                {"id": 2, "topic": "Clinical Details", "instruction": "Provide key clinical insights", "expected_tokens": 200, "priority": "medium"},
             ],
-            "max_topics": 2,
-            "strategy": "Sequential answer with key points"
         }
         logger.warning(f"[GEMINI SUPERVISOR] Using fallback breakdown")
         return breakdown
@@ -1367,11 +1381,11 @@ def gemini_supervisor_breakdown(query: str, use_rag: bool, use_web_search: bool,
         logger.warning("[GEMINI SUPERVISOR] MCP unavailable, using fallback breakdown")
         return {
             "sub_topics": [
-                {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high"},
-                {"id": 2, "topic": "Clinical Details", "instruction": "Provide key clinical insights", "expected_tokens": 200, "priority": "medium"},
             ],
-            "max_topics": 2,
-            "strategy": "Sequential answer with key points"
         }
     try:
@@ -1392,10 +1406,10 @@ def gemini_supervisor_breakdown(query: str, use_rag: bool, use_web_search: bool,
         logger.error(f"[GEMINI SUPERVISOR] Breakdown request failed: {exc}")
         return {
             "sub_topics": [
-                {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high"},
             ],
-            "max_topics": 1,
-            "strategy": "Direct answer"
         }
 def gemini_supervisor_search_strategies(query: str, time_elapsed: float) -> dict:
@@ -1541,6 +1555,311 @@ def execute_medswin_task(
     logger.info(f"[MEDSWIN] Task completed: {len(response)} chars generated")
     return response
 async def self_reflection_gemini(answer: str, query: str) -> dict:
     """Self-reflection using Gemini MCP"""
     reflection_prompt = f"""Evaluate this medical answer for quality and completeness:
@@ -1879,10 +2198,10 @@ def stream_chat(
         # Simple breakdown for direct mode
         breakdown = {
             "sub_topics": [
-                {"id": 1, "topic": "Answer", "instruction": message, "expected_tokens": 400, "priority": "high"}
             ],
-            "max_topics": 1,
-            "strategy": "Direct answer"
         }
     else:
         logger.info("[GEMINI SUPERVISOR] Breaking query into sub-topics...")
@@ -2017,10 +2336,16 @@ def stream_chat(
             # Continue with next task
             continue
-    # ===== STEP 5: Combine all MedSwin answers into final answer =====
-    final_answer = "\n\n".join(medswin_answers) if medswin_answers else "I apologize, but I was unable to generate a response."
-    citations_text = ""
     # Clean final answer - ensure no tables, only Markdown bullets
     if "|" in final_answer and "---" in final_answer:
         logger.warning("[MEDSWIN] Final answer contains tables, converting to bullets")
@@ -2036,7 +2361,71 @@ def stream_chat(
                 cleaned_lines.append(line)
         final_answer = '\n'.join(cleaned_lines)
-    # ===== STEP 6: Finalize answer (translate, add citations, format) =====
     # Translate back if needed
     if needs_translation and final_answer:
         logger.info(f"[GEMINI SUPERVISOR] Translating response back to {original_lang}...")

 async def gemini_supervisor_breakdown_async(query: str, use_rag: bool, use_web_search: bool, time_elapsed: float, max_duration: int = 120) -> dict:
     """
+    Gemini Supervisor: Break user query into sub-topics (flexible number, explore different approaches)
     This is the main supervisor function that orchestrates the MAC architecture.
     All internal thoughts are logged, not displayed.
     """
     if not mode_description:
         mode_description.append("Direct answer mode - no additional context")
+    # Calculate reasonable max topics based on time remaining
+    # Allow more subtasks if we have time, but be flexible
+    estimated_time_per_task = 8  # seconds per task
+    max_topics_by_time = max(2, int((remaining_time - 20) / estimated_time_per_task))
+    max_topics = min(max_topics_by_time, 10)  # Cap at 10, but allow more than 4
     prompt = f"""You are a supervisor agent coordinating with a MedSwin medical specialist model.
+Break the following medical query into focused sub-topics that MedSwin can answer sequentially.
+Explore different potential approaches to comprehensively address the topic.
 Query: "{query}"
 Mode: {', '.join(mode_description)}
 Time Remaining: ~{remaining_time:.1f}s
+Maximum Topics: {max_topics} (adjust based on complexity - use as many as needed for thorough coverage)
 Return ONLY valid JSON (no markdown, no tables, no explanations):
 {{
       "topic": "concise topic name",
       "instruction": "specific directive for MedSwin to answer this topic",
       "expected_tokens": 200,
+      "priority": "high|medium|low",
+      "approach": "brief description of approach/angle for this topic"
     }},
     ...
   ],
+  "strategy": "brief strategy description explaining the breakdown approach",
+  "exploration_note": "brief note on different approaches explored"
 }}
+Guidelines:
+- Break down the query into as many subtasks as needed for comprehensive coverage
+- Explore different angles/approaches (e.g., clinical, diagnostic, treatment, prevention, research perspectives)
+- Each topic should be focused and answerable in ~200 tokens by MedSwin
+- Prioritize topics by importance (high priority first)
+- Don't limit yourself to 4 topics - use more if the query is complex or multi-faceted"""
+    system_prompt = "You are a medical query supervisor. Break queries into structured JSON sub-topics, exploring different approaches. Return ONLY valid JSON."
     response = await call_agent(
         user_prompt=prompt,
         # Fallback: simple breakdown
         breakdown = {
             "sub_topics": [
+                {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high", "approach": "direct answer"},
+                {"id": 2, "topic": "Clinical Details", "instruction": "Provide key clinical insights", "expected_tokens": 200, "priority": "medium", "approach": "clinical perspective"},
             ],
+            "strategy": "Sequential answer with key points",
+            "exploration_note": "Fallback breakdown - basic coverage"
         }
         logger.warning(f"[GEMINI SUPERVISOR] Using fallback breakdown")
         return breakdown
         logger.warning("[GEMINI SUPERVISOR] MCP unavailable, using fallback breakdown")
         return {
             "sub_topics": [
+                {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high", "approach": "direct answer"},
+                {"id": 2, "topic": "Clinical Details", "instruction": "Provide key clinical insights", "expected_tokens": 200, "priority": "medium", "approach": "clinical perspective"},
             ],
+            "strategy": "Sequential answer with key points",
+            "exploration_note": "Fallback breakdown - basic coverage"
         }
     try:
         logger.error(f"[GEMINI SUPERVISOR] Breakdown request failed: {exc}")
         return {
             "sub_topics": [
+                {"id": 1, "topic": "Core Question", "instruction": "Address the main medical question", "expected_tokens": 200, "priority": "high", "approach": "direct answer"},
             ],
+            "strategy": "Direct answer",
+            "exploration_note": "Fallback breakdown - single topic"
         }
 def gemini_supervisor_search_strategies(query: str, time_elapsed: float) -> dict:
     logger.info(f"[MEDSWIN] Task completed: {len(response)} chars generated")
     return response
+async def gemini_supervisor_synthesize_async(query: str, medswin_answers: list, rag_contexts: list, search_contexts: list, breakdown: dict) -> str:
+    """
+    Gemini Supervisor: Synthesize final answer from all MedSwin responses with clear context
+    Provides better context to create a comprehensive, well-structured final answer
+    """
+    # Prepare context summary
+    context_summary = ""
+    if rag_contexts:
+        context_summary += f"Document Context Available: {len(rag_contexts)} context(s) from uploaded documents.\n"
+    if search_contexts:
+        context_summary += f"Web Search Context Available: {len(search_contexts)} search result(s).\n"
+    # Combine all MedSwin answers
+    all_answers_text = "\n\n---\n\n".join([f"## {i+1}. {ans}" for i, ans in enumerate(medswin_answers)])
+    prompt = f"""You are a supervisor agent synthesizing a comprehensive medical answer from multiple specialist responses.
+Original Query: "{query}"
+Context Available:
+{context_summary}
+MedSwin Specialist Responses (from {len(medswin_answers)} sub-topics):
+{all_answers_text}
+Your task:
+1. Synthesize all responses into a coherent, comprehensive final answer
+2. Integrate information from all sub-topics seamlessly
+3. Ensure the answer directly addresses the original query
+4. Maintain clinical accuracy and clarity
+5. Use clear structure with appropriate headings and bullet points
+6. Remove redundancy and contradictions
+7. Ensure all important points from MedSwin responses are included
+Return the final synthesized answer in Markdown format. Do not add meta-commentary or explanations - just provide the final answer."""
+    system_prompt = "You are a medical answer synthesis supervisor. Create comprehensive, well-structured final answers from multiple specialist responses."
+    result = await call_agent(
+        user_prompt=prompt,
+        system_prompt=system_prompt,
+        model=GEMINI_MODEL,
+        temperature=0.3
+    )
+    return result.strip()
+async def gemini_supervisor_challenge_async(query: str, current_answer: str, medswin_answers: list, rag_contexts: list, search_contexts: list) -> dict:
+    """
+    Gemini Supervisor: Challenge and evaluate the current answer, suggesting improvements
+    Returns evaluation with suggestions for enhancement
+    """
+    context_info = ""
+    if rag_contexts:
+        context_info += f"Document contexts: {len(rag_contexts)} available.\n"
+    if search_contexts:
+        context_info += f"Search contexts: {len(search_contexts)} available.\n"
+    all_answers_text = "\n\n---\n\n".join([f"## {i+1}. {ans}" for i, ans in enumerate(medswin_answers)])
+    prompt = f"""You are a supervisor agent evaluating and challenging a medical answer for quality and completeness.
+Original Query: "{query}"
+Available Context:
+{context_info}
+MedSwin Specialist Responses:
+{all_answers_text}
+Current Synthesized Answer:
+{current_answer[:2000]}
+Evaluate this answer and provide:
+1. Completeness: Does it fully address the query? What's missing?
+2. Accuracy: Are there any inaccuracies or contradictions?
+3. Clarity: Is it well-structured and clear?
+4. Context Usage: Are document/search contexts properly utilized?
+5. Improvement Suggestions: Specific ways to enhance the answer
+Return ONLY valid JSON:
+{{
+  "is_optimal": true/false,
+  "completeness_score": 0-10,
+  "accuracy_score": 0-10,
+  "clarity_score": 0-10,
+  "missing_aspects": ["..."],
+  "inaccuracies": ["..."],
+  "improvement_suggestions": ["..."],
+  "needs_more_context": true/false,
+  "enhancement_instructions": "specific instructions for improving the answer"
+}}"""
+    system_prompt = "You are a medical answer quality evaluator. Provide honest, constructive feedback in JSON format. Return ONLY valid JSON."
+    response = await call_agent(
+        user_prompt=prompt,
+        system_prompt=system_prompt,
+        model=GEMINI_MODEL,
+        temperature=0.3
+    )
+    try:
+        json_start = response.find('{')
+        json_end = response.rfind('}') + 1
+        if json_start >= 0 and json_end > json_start:
+            evaluation = json.loads(response[json_start:json_end])
+            logger.info(f"[GEMINI SUPERVISOR] Challenge evaluation: optimal={evaluation.get('is_optimal', False)}, scores={evaluation.get('completeness_score', 'N/A')}/{evaluation.get('accuracy_score', 'N/A')}/{evaluation.get('clarity_score', 'N/A')}")
+            return evaluation
+        else:
+            raise ValueError("Evaluation JSON not found")
+    except Exception as exc:
+        logger.error(f"[GEMINI SUPERVISOR] Challenge evaluation parsing failed: {exc}")
+        return {
+            "is_optimal": True,
+            "completeness_score": 7,
+            "accuracy_score": 7,
+            "clarity_score": 7,
+            "missing_aspects": [],
+            "inaccuracies": [],
+            "improvement_suggestions": [],
+            "needs_more_context": False,
+            "enhancement_instructions": ""
+        }
+async def gemini_supervisor_enhance_answer_async(query: str, current_answer: str, enhancement_instructions: str, medswin_answers: list, rag_contexts: list, search_contexts: list) -> str:
+    """
+    Gemini Supervisor: Enhance the answer based on challenge feedback
+    """
+    context_info = ""
+    if rag_contexts:
+        context_info += f"Document contexts: {len(rag_contexts)} available.\n"
+    if search_contexts:
+        context_info += f"Search contexts: {len(search_contexts)} available.\n"
+    all_answers_text = "\n\n---\n\n".join([f"## {i+1}. {ans}" for i, ans in enumerate(medswin_answers)])
+    prompt = f"""You are a supervisor agent enhancing a medical answer based on evaluation feedback.
+Original Query: "{query}"
+Available Context:
+{context_info}
+MedSwin Specialist Responses:
+{all_answers_text}
+Current Answer (to enhance):
+{current_answer}
+Enhancement Instructions:
+{enhancement_instructions}
+Create an enhanced version of the answer that:
+1. Addresses all improvement suggestions
+2. Fills in missing aspects
+3. Corrects any inaccuracies
+4. Improves clarity and structure
+5. Better utilizes available context
+6. Maintains all valuable information from the current answer
+Return the enhanced answer in Markdown format. Do not add meta-commentary."""
+    system_prompt = "You are a medical answer enhancement supervisor. Improve answers based on evaluation feedback while maintaining accuracy."
+    result = await call_agent(
+        user_prompt=prompt,
+        system_prompt=system_prompt,
+        model=GEMINI_MODEL,
+        temperature=0.3
+    )
+    return result.strip()
+async def gemini_supervisor_check_clarity_async(query: str, answer: str, use_web_search: bool) -> dict:
+    """
+    Gemini Supervisor: Check if answer is unclear or supervisor is unsure (only when search mode enabled)
+    Returns decision on whether to trigger additional search
+    """
+    if not use_web_search:
+        # Only check clarity when search mode is enabled
+        return {"is_unclear": False, "needs_search": False, "search_queries": []}
+    prompt = f"""You are a supervisor agent evaluating answer clarity and completeness.
+Query: "{query}"
+Current Answer:
+{answer[:1500]}
+Evaluate:
+1. Is the answer unclear or incomplete?
+2. Are there gaps that web search could fill?
+3. Is the supervisor (you) unsure about certain aspects?
+Return ONLY valid JSON:
+{{
+  "is_unclear": true/false,
+  "needs_search": true/false,
+  "uncertainty_areas": ["..."],
+  "search_queries": ["specific search queries to fill gaps"],
+  "rationale": "brief explanation"
+}}
+Only suggest search if the answer is genuinely unclear or has significant gaps that search could address."""
+    system_prompt = "You are a clarity evaluator. Assess if additional web search is needed. Return ONLY valid JSON."
+    response = await call_agent(
+        user_prompt=prompt,
+        system_prompt=system_prompt,
+        model=GEMINI_MODEL_LITE,
+        temperature=0.2
+    )
+    try:
+        json_start = response.find('{')
+        json_end = response.rfind('}') + 1
+        if json_start >= 0 and json_end > json_start:
+            evaluation = json.loads(response[json_start:json_end])
+            logger.info(f"[GEMINI SUPERVISOR] Clarity check: unclear={evaluation.get('is_unclear', False)}, needs_search={evaluation.get('needs_search', False)}")
+            return evaluation
+        else:
+            raise ValueError("Clarity check JSON not found")
+    except Exception as exc:
+        logger.error(f"[GEMINI SUPERVISOR] Clarity check parsing failed: {exc}")
+        return {"is_unclear": False, "needs_search": False, "search_queries": []}
+def gemini_supervisor_synthesize(query: str, medswin_answers: list, rag_contexts: list, search_contexts: list, breakdown: dict) -> str:
+    """Wrapper to synthesize answer synchronously"""
+    if not MCP_AVAILABLE:
+        logger.warning("[GEMINI SUPERVISOR] MCP unavailable for synthesis, using simple concatenation")
+        return "\n\n".join(medswin_answers)
+    try:
+        loop = asyncio.get_event_loop()
+        if loop.is_running():
+            try:
+                import nest_asyncio
+                return nest_asyncio.run(gemini_supervisor_synthesize_async(query, medswin_answers, rag_contexts, search_contexts, breakdown))
+            except Exception as exc:
+                logger.error(f"[GEMINI SUPERVISOR] Nested synthesis failed: {exc}")
+                return "\n\n".join(medswin_answers)
+        return loop.run_until_complete(gemini_supervisor_synthesize_async(query, medswin_answers, rag_contexts, search_contexts, breakdown))
+    except Exception as exc:
+        logger.error(f"[GEMINI SUPERVISOR] Synthesis failed: {exc}")
+        return "\n\n".join(medswin_answers)
+def gemini_supervisor_challenge(query: str, current_answer: str, medswin_answers: list, rag_contexts: list, search_contexts: list) -> dict:
+    """Wrapper to challenge answer synchronously"""
+    if not MCP_AVAILABLE:
+        return {"is_optimal": True, "completeness_score": 7, "accuracy_score": 7, "clarity_score": 7, "missing_aspects": [], "inaccuracies": [], "improvement_suggestions": [], "needs_more_context": False, "enhancement_instructions": ""}
+    try:
+        loop = asyncio.get_event_loop()
+        if loop.is_running():
+            try:
+                import nest_asyncio
+                return nest_asyncio.run(gemini_supervisor_challenge_async(query, current_answer, medswin_answers, rag_contexts, search_contexts))
+            except Exception as exc:
+                logger.error(f"[GEMINI SUPERVISOR] Nested challenge failed: {exc}")
+                return {"is_optimal": True, "completeness_score": 7, "accuracy_score": 7, "clarity_score": 7, "missing_aspects": [], "inaccuracies": [], "improvement_suggestions": [], "needs_more_context": False, "enhancement_instructions": ""}
+        return loop.run_until_complete(gemini_supervisor_challenge_async(query, current_answer, medswin_answers, rag_contexts, search_contexts))
+    except Exception as exc:
+        logger.error(f"[GEMINI SUPERVISOR] Challenge failed: {exc}")
+        return {"is_optimal": True, "completeness_score": 7, "accuracy_score": 7, "clarity_score": 7, "missing_aspects": [], "inaccuracies": [], "improvement_suggestions": [], "needs_more_context": False, "enhancement_instructions": ""}
+def gemini_supervisor_enhance_answer(query: str, current_answer: str, enhancement_instructions: str, medswin_answers: list, rag_contexts: list, search_contexts: list) -> str:
+    """Wrapper to enhance answer synchronously"""
+    if not MCP_AVAILABLE:
+        return current_answer
+    try:
+        loop = asyncio.get_event_loop()
+        if loop.is_running():
+            try:
+                import nest_asyncio
+                return nest_asyncio.run(gemini_supervisor_enhance_answer_async(query, current_answer, enhancement_instructions, medswin_answers, rag_contexts, search_contexts))
+            except Exception as exc:
+                logger.error(f"[GEMINI SUPERVISOR] Nested enhancement failed: {exc}")
+                return current_answer
+        return loop.run_until_complete(gemini_supervisor_enhance_answer_async(query, current_answer, enhancement_instructions, medswin_answers, rag_contexts, search_contexts))
+    except Exception as exc:
+        logger.error(f"[GEMINI SUPERVISOR] Enhancement failed: {exc}")
+        return current_answer
+def gemini_supervisor_check_clarity(query: str, answer: str, use_web_search: bool) -> dict:
+    """Wrapper to check clarity synchronously"""
+    if not MCP_AVAILABLE or not use_web_search:
+        return {"is_unclear": False, "needs_search": False, "search_queries": []}
+    try:
+        loop = asyncio.get_event_loop()
+        if loop.is_running():
+            try:
+                import nest_asyncio
+                return nest_asyncio.run(gemini_supervisor_check_clarity_async(query, answer, use_web_search))
+            except Exception as exc:
+                logger.error(f"[GEMINI SUPERVISOR] Nested clarity check failed: {exc}")
+                return {"is_unclear": False, "needs_search": False, "search_queries": []}
+        return loop.run_until_complete(gemini_supervisor_check_clarity_async(query, answer, use_web_search))
+    except Exception as exc:
+        logger.error(f"[GEMINI SUPERVISOR] Clarity check failed: {exc}")
+        return {"is_unclear": False, "needs_search": False, "search_queries": []}
 async def self_reflection_gemini(answer: str, query: str) -> dict:
     """Self-reflection using Gemini MCP"""
     reflection_prompt = f"""Evaluate this medical answer for quality and completeness:
         # Simple breakdown for direct mode
         breakdown = {
             "sub_topics": [
+                {"id": 1, "topic": "Answer", "instruction": message, "expected_tokens": 400, "priority": "high", "approach": "direct answer"}
             ],
+            "strategy": "Direct answer",
+            "exploration_note": "Direct mode - no breakdown"
         }
     else:
         logger.info("[GEMINI SUPERVISOR] Breaking query into sub-topics...")
             # Continue with next task
             continue
+    # ===== STEP 5: GEMINI SUPERVISOR - Synthesize final answer with clear context =====
+    logger.info("[GEMINI SUPERVISOR] Synthesizing final answer from all MedSwin responses...")
+    raw_medswin_answers = [ans.split('\n\n', 1)[1] if '\n\n' in ans else ans for ans in medswin_answers]  # Remove headers for synthesis
+    final_answer = gemini_supervisor_synthesize(message, raw_medswin_answers, rag_contexts, search_contexts, breakdown)
+    if not final_answer or len(final_answer.strip()) < 50:
+        # Fallback to simple concatenation if synthesis fails
+        logger.warning("[GEMINI SUPERVISOR] Synthesis failed or too short, using concatenation")
+        final_answer = "\n\n".join(medswin_answers) if medswin_answers else "I apologize, but I was unable to generate a response."
     # Clean final answer - ensure no tables, only Markdown bullets
     if "|" in final_answer and "---" in final_answer:
         logger.warning("[MEDSWIN] Final answer contains tables, converting to bullets")
                 cleaned_lines.append(line)
         final_answer = '\n'.join(cleaned_lines)
+    # ===== STEP 6: GEMINI SUPERVISOR - Challenge and enhance answer iteratively =====
+    max_challenge_iterations = 2  # Limit iterations to avoid timeout
+    challenge_iteration = 0
+    while challenge_iteration < max_challenge_iterations and elapsed() < soft_timeout - 15:
+        challenge_iteration += 1
+        logger.info(f"[GEMINI SUPERVISOR] Challenge iteration {challenge_iteration}/{max_challenge_iterations}...")
+        evaluation = gemini_supervisor_challenge(message, final_answer, raw_medswin_answers, rag_contexts, search_contexts)
+        if evaluation.get("is_optimal", False):
+            logger.info(f"[GEMINI SUPERVISOR] Answer confirmed optimal after {challenge_iteration} iteration(s)")
+            break
+        enhancement_instructions = evaluation.get("enhancement_instructions", "")
+        if not enhancement_instructions:
+            logger.info("[GEMINI SUPERVISOR] No enhancement instructions, considering answer optimal")
+            break
+        logger.info(f"[GEMINI SUPERVISOR] Enhancing answer based on feedback...")
+        enhanced_answer = gemini_supervisor_enhance_answer(
+            message, final_answer, enhancement_instructions, raw_medswin_answers, rag_contexts, search_contexts
+        )
+        if enhanced_answer and len(enhanced_answer.strip()) > len(final_answer.strip()) * 0.8:  # Ensure enhancement is substantial
+            final_answer = enhanced_answer
+            logger.info(f"[GEMINI SUPERVISOR] Answer enhanced (new length: {len(final_answer)} chars)")
+        else:
+            logger.info("[GEMINI SUPERVISOR] Enhancement did not improve answer significantly, stopping")
+            break
+    # ===== STEP 7: Conditional search trigger (only when search mode enabled) =====
+    if final_use_web_search and elapsed() < soft_timeout - 10:
+        logger.info("[GEMINI SUPERVISOR] Checking if additional search is needed...")
+        clarity_check = gemini_supervisor_check_clarity(message, final_answer, final_use_web_search)
+        if clarity_check.get("needs_search", False) and clarity_check.get("search_queries"):
+            logger.info(f"[GEMINI SUPERVISOR] Triggering additional search: {clarity_check.get('search_queries', [])}")
+            additional_search_results = []
+            for search_query in clarity_check.get("search_queries", [])[:3]:  # Limit to 3 additional searches
+                if elapsed() >= soft_timeout - 5:
+                    break
+                results = search_web(search_query, max_results=2)
+                additional_search_results.extend(results)
+                web_urls.extend([r.get('url', '') for r in results if r.get('url')])
+            if additional_search_results:
+                logger.info(f"[GEMINI SUPERVISOR] Summarizing {len(additional_search_results)} additional search results...")
+                additional_summary = summarize_web_content(additional_search_results, message)
+                if additional_summary:
+                    # Enhance answer with additional search context
+                    search_contexts.append(additional_summary)
+                    logger.info("[GEMINI SUPERVISOR] Enhancing answer with additional search context...")
+                    enhanced_with_search = gemini_supervisor_enhance_answer(
+                        message, final_answer,
+                        f"Incorporate the following additional information from web search: {additional_summary}",
+                        raw_medswin_answers, rag_contexts, search_contexts
+                    )
+                    if enhanced_with_search and len(enhanced_with_search.strip()) > 50:
+                        final_answer = enhanced_with_search
+                        logger.info("[GEMINI SUPERVISOR] Answer enhanced with additional search context")
+    citations_text = ""
+    # ===== STEP 8: Finalize answer (translate, add citations, format) =====
     # Translate back if needed
     if needs_translation and final_answer:
         logger.info(f"[GEMINI SUPERVISOR] Translating response back to {original_lang}...")