TiramisuQiao commited on
Commit
546431c
·
verified ·
1 Parent(s): 640a2e2

Upload 3 files

Browse files
Files changed (4) hide show
  1. .gitattributes +1 -0
  2. README.md +328 -1
  3. README_CN.md +324 -0
  4. main_results.png +3 -0
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ main_results.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -243,4 +243,331 @@ model-index:
243
  value: 92.2
244
  verified: false
245
 
246
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
243
  value: 92.2
244
  verified: false
245
 
246
+ ---
247
+
248
+ # MedGo: Medical Large Language Model Based on Qwen2.5-32B
249
+
250
+ <div align="center">
251
+
252
+ [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow)](https://huggingface.co/OpenMedZoo/MedGo)
253
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
254
+ [![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://www.python.org/)
255
+
256
+
257
+ English | [简体中文](./README_CN.md)
258
+
259
+ </div>
260
+
261
+ ## 📋 Table of Contents
262
+
263
+ - [Introduction](#introduction)
264
+ - [Key Features](#key-features)
265
+ - [Performance](#performance)
266
+ - [Quick Start](#quick-start)
267
+ - [Training Details](#training-details)
268
+ - [Use Cases](#use-cases)
269
+ - [Limitations & Risks](#limitations--risks)
270
+ - [Citation](#citation)
271
+ - [License](#license)
272
+ - [Contributing](#contributing)
273
+ - [Contact](#contact)
274
+
275
+ ## 🎯 Introduction
276
+
277
+ **MedGo** is a general-purpose medical large language model fine-tuned from **Qwen2.5-32B**, designed for clinical medicine and research scenarios. The model is trained on large-scale multi-source medical corpora and enhanced with complex case data, supporting various capabilities including medical Q&A, clinical summary, clinical reasoning, multi-turn dialogue, and scientific text generation.
278
+
279
+ ### 🌟 Core Capabilities
280
+
281
+ - **📚 Medical Knowledge Q&A**: Professional responses based on authoritative medical literature and clinical guidelines
282
+ - **📝 Clinical Documentation**: Automated medical record summaries, diagnostic reports, and medical documentation
283
+ - **🔍 Clinical Reasoning**: Differential diagnosis, examination recommendations, and treatment suggestions
284
+ - **💬 Multi-turn Dialogue**: Patient-doctor interaction simulation and complex case discussions
285
+ - **🔬 Research Support**: Literature summarization, research idea generation, and quality control review
286
+
287
+ ## ✨ Key Features
288
+
289
+ | Feature | Details |
290
+ |---------|---------|
291
+ | **Base Architecture** | Qwen2.5-32B |
292
+ | **Parameters** | 32B |
293
+ | **Domain** | Clinical Medicine, Research Support, Healthcare System Integration |
294
+ | **Fine-tuning Method** | SFT + Preference Alignment (DPO/KTO) |
295
+ | **Data Sources** | Authoritative medical literature, clinical guidelines, real cases (anonymized) |
296
+ | **Deployment** | Local deployment, HIS/EMR system integration |
297
+ | **License** | Apache 2.0 |
298
+
299
+ ## 📊 Performance
300
+
301
+ MedGo demonstrates excellent performance across multiple medical and general evaluation benchmarks, showing competitive results among 30B-parameter models:
302
+
303
+ ### Key Benchmark Results
304
+
305
+ - **AIMedQA**: Medical question answering comprehension
306
+ - **CME**: Clinical reasoning evaluation
307
+ - **DiagnosisArena**: Diagnostic capability assessment
308
+ - **MedQA / MedMCQA**: Medical multiple-choice questions
309
+ - **PubMedQA**: Biomedical literature Q&A
310
+ - **MMLU-Pro**: Comprehensive capability evaluation
311
+
312
+ ![Performance Comparison](./main_results.png)
313
+
314
+ **Performance Highlights**:
315
+ - ✅ **Average Score**: ~70 points (excellent performance in the 30B parameter class)
316
+ - ✅ **Strong Tasks**: Clinical reasoning (DiagnosisArena, CME) and multi-turn medical Q&A
317
+ - ✅ **Balanced Capability**: Good performance in medical semantic understanding and multi-task generalization
318
+
319
+
320
+ ## 🚀 Quick Start
321
+
322
+ ### Requirements
323
+
324
+ - Python >= 3.8
325
+ - PyTorch >= 2.0
326
+ - Transformers >= 4.35.0
327
+ - CUDA >= 11.8 (for GPU inference)
328
+
329
+ ### Installation
330
+
331
+ ```bash
332
+ # Clone the repository
333
+ git clone https://github.com/OpenMedZoo/MedGo.git
334
+ cd MedGo
335
+
336
+ # Install dependencies
337
+ pip install -r requirements.txt
338
+ ```
339
+
340
+ ### Model Download
341
+
342
+ Download model weights from HuggingFace:
343
+
344
+ ```bash
345
+ # Using huggingface-cli
346
+ huggingface-cli download OpenMedZoo/MedGo --local-dir ./models/MedGo
347
+
348
+ # Or using git-lfs
349
+ git lfs install
350
+ git clone https://huggingface.co/OpenMedZoo/MedGo
351
+ ```
352
+
353
+ ### Basic Inference
354
+
355
+ ```python
356
+ from transformers import AutoModelForCausalLM, AutoTokenizer
357
+
358
+ # Load model and tokenizer
359
+ model_path = "OpenMedZoo/MedGo"
360
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
361
+ model = AutoModelForCausalLM.from_pretrained(
362
+ model_path,
363
+ device_map="auto",
364
+ trust_remote_code=True,
365
+ torch_dtype="auto"
366
+ )
367
+
368
+ # Medical Q&A example
369
+ messages = [
370
+ {"role": "system", "content": "You are a professional medical assistant. Please answer questions based on medical knowledge."},
371
+ {"role": "user", "content": "What is hypertension and what are the common treatment methods?"}
372
+ ]
373
+
374
+ # Generate response
375
+ inputs = tokenizer.apply_chat_template(
376
+ messages,
377
+ tokenize=True,
378
+ add_generation_prompt=True,
379
+ return_tensors="pt"
380
+ ).to(model.device)
381
+
382
+ outputs = model.generate(
383
+ inputs,
384
+ max_new_tokens=512,
385
+ temperature=0.7,
386
+ top_p=0.9,
387
+ do_sample=True
388
+ )
389
+
390
+ response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
391
+ print(response)
392
+ ```
393
+
394
+ ### Batch Inference
395
+
396
+ ```bash
397
+ # Use the provided inference script
398
+ python scripts/inference.py \
399
+ --model_path OpenMedZoo/MedGo \
400
+ --input_file examples/medical_qa.jsonl \
401
+ --output_file results/predictions.jsonl \
402
+ --batch_size 4
403
+ ```
404
+
405
+ ### Accelerated Inference with vLLM
406
+
407
+ ```python
408
+ from vllm import LLM, SamplingParams
409
+
410
+ # Initialize vLLM
411
+ llm = LLM(model="OpenMedZoo/MedGo", trust_remote_code=True)
412
+ sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)
413
+
414
+ # Batch inference
415
+ prompts = [
416
+ "What are the symptoms and treatment methods for diabetes?",
417
+ "What dietary precautions should hypertensive patients take?"
418
+ ]
419
+
420
+ outputs = llm.generate(prompts, sampling_params)
421
+ for output in outputs:
422
+ print(output.outputs[0].text)
423
+ ```
424
+
425
+ ## 🔧 Training Details
426
+
427
+ MedGo employs a **two-stage fine-tuning strategy** to balance general medical knowledge with clinical task adaptation.
428
+
429
+ ### Stage I: General Medical Alignment
430
+
431
+ **Objective**: Establish a solid foundation of medical knowledge and improve Q&A standardization
432
+
433
+ - **Data Sources**:
434
+ - Authoritative medical literature (PubMed, medical textbooks)
435
+ - Clinical guidelines and diagnostic standards
436
+ - Medical encyclopedia entries and terminology databases
437
+
438
+ - **Training Methods**:
439
+ - Supervised Fine-Tuning (SFT)
440
+ - Chain-of-Thought (CoT) guided samples
441
+ - Medical terminology alignment and safety constraints
442
+
443
+ ### Stage II: Clinical Task Enhancement
444
+
445
+ **Objective**: Enhance complex case reasoning and multi-task processing capabilities
446
+
447
+ - **Data Sources**:
448
+ - Real medical records (fully anonymized)
449
+ - Outpatient and emergency records with complex multi-diagnosis samples
450
+ - Research articles and quality control cases
451
+
452
+ - **Data Augmentation Techniques**:
453
+ - Semantic paraphrasing and multi-perspective expansion
454
+ - Complex case synthesis
455
+ - Doctor-patient interaction simulation
456
+
457
+ - **Training Methods**:
458
+ - Multi-Task Learning (medical record summary, differential diagnosis, examination suggestions, etc.)
459
+ - Preference Alignment (DPO/KTO)
460
+ - Expert feedback iterative optimization
461
+
462
+ ### Training Optimization Focus
463
+
464
+ - ✅ Strengthen information extraction and cross-evidence reasoning for complex cases
465
+ - ✅ Improve medical consistency and interpretability of outputs
466
+ - ✅ Optimize expression compliance and safety
467
+ - ✅ Continuous iteration through expert samples and automated evaluation
468
+
469
+ ## 💡 Use Cases
470
+
471
+ ### ✅ Suitable Scenarios
472
+
473
+ | Scenario | Description |
474
+ |----------|-------------|
475
+ | **Clinical Assistance** | Preliminary diagnosis suggestions, medical record writing, formatted report generation |
476
+ | **Research Support** | Literature summarization, research idea generation, data analysis assistance |
477
+ | **Quality Control** | Medical document compliance checking, clinical process quality control |
478
+ | **System Integration** | Embedded in HIS/EMR systems to provide intelligent decision support |
479
+ | **Medical Education** | Case discussions, medical knowledge Q&A, clinical reasoning training |
480
+
481
+ ### 🚫 Unsuitable Scenarios
482
+
483
+ - ❌ **Cannot Replace Doctors**: Only an auxiliary tool, not a standalone diagnostic basis
484
+ - ❌ **High-Risk Operations**: Not recommended for surgical decisions or other high-risk medical operations
485
+ - ❌ **Rare Disease Limitations**: May perform poorly on rare diseases outside training data
486
+ - ❌ **Emergency Care**: Not suitable for scenarios requiring immediate decisions
487
+
488
+ ## ⚠️ Limitations & Risks
489
+
490
+ ### Model Limitations
491
+
492
+ 1. **Understanding Bias**: Despite covering extensive medical knowledge, may still produce understanding biases or incorrect recommendations
493
+ 2. **Complex Cases**: Higher risk for cases with complex conditions, severe complications, or missing information
494
+ 3. **Knowledge Currency**: Medical knowledge continuously updates; training data may lag
495
+ 4. **Language Limitation**: Primarily designed for Chinese medical scenarios; performance in other languages may vary
496
+
497
+ ### Usage Recommendations
498
+
499
+ - ⚠️ Use in controlled environments with clinical expert review of generated results
500
+ - ⚠️ Treat model outputs as auxiliary references, not final diagnostic conclusions
501
+ - ⚠️ For sensitive cases or high-risk scenarios, expert consultation is mandatory
502
+ - ⚠️ Deployment requires internal validation, security review, and clinical testing
503
+
504
+ ### Data Privacy & Compliance
505
+
506
+ - 🔒 Training data fully anonymized
507
+ - 🔒 Attention to patient privacy protection during use
508
+ - 🔒 Production deployment must comply with healthcare data security regulations (e.g., HIPAA, GDPR)
509
+ - 🔒 Local deployment recommended to avoid sensitive data transmission
510
+
511
+ ## 📚 Citation
512
+
513
+ If MedGo is helpful for your research or project, please cite our work:
514
+
515
+ ```bibtex
516
+ @misc{openmedzoo_2025,
517
+ author = { OpenMedZoo },
518
+ title = { MedGo (Revision 640a2e2) },
519
+ year = 2025,
520
+ url = { https://huggingface.co/OpenMedZoo/MedGo },
521
+ doi = { 10.57967/hf/7024 },
522
+ publisher = { Hugging Face }
523
+ }
524
+ ```
525
+
526
+ ## 📄 License
527
+
528
+ This project is licensed under the [Apache License 2.0](LICENSE).
529
+
530
+ **Commercial Use Notice**:
531
+ - ✅ Commercial use and modification allowed
532
+ - ✅ Original license and copyright notice must be retained
533
+ - ✅ Contact us for technical support when integrating into healthcare systems
534
+
535
+ ## 🤝 Contributing
536
+
537
+ We welcome community contributions! Here's how to participate:
538
+
539
+ ### Contribution Types
540
+
541
+ - 🐛 Submit bug reports
542
+ - 💡 Propose new features
543
+ - 📝 Improve documentation
544
+ - 🔧 Submit code fixes or optimizations
545
+ - 📊 Share evaluation results and use cases
546
+
547
+
548
+ ## 🙏 Acknowledgments
549
+
550
+ Thanks to all contributors to the MedGo project:
551
+
552
+ - Model development and fine-tuning algorithm team
553
+ - Data annotation and quality control team
554
+ - Clinical expert guidance and review team
555
+ - Open-source community support and feedback
556
+
557
+ Special thanks to:
558
+ - [Qwen Team](https://github.com/QwenLM/Qwen) for providing excellent foundation models
559
+ - All healthcare institutions that provided data and feedback
560
+
561
+ ## 📧 Contact
562
+
563
+ - **HuggingFace**: [Model Homepage](https://huggingface.co/OpenMedZoo/MedGo)
564
+
565
+ ---
566
+
567
+ <div align="center">
568
+
569
+ [⬆ Back to Top](#medgo-medical-large-language-model-based-on-qwen25-32b)
570
+
571
+ </div>
572
+
573
+
README_CN.md ADDED
@@ -0,0 +1,324 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MedGo: 基于 Qwen2.5-32B 的医疗大模型
2
+
3
+ <div align="center">
4
+
5
+ [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow)](https://huggingface.co/OpenMedZoo/MedGo)
6
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
7
+ [![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://www.python.org/)
8
+
9
+ [English](./README.md) | 简体中文
10
+
11
+ </div>
12
+
13
+
14
+ ## 📋 目录
15
+
16
+ - [简介](#简介)
17
+ - [模型特点](#模型特点)
18
+ - [性能评估](#性能评估)
19
+ - [快速开始](#快速开始)
20
+ - [训练细节](#训练细节)
21
+ - [使用场景](#使用场景)
22
+ - [限制与风险](#限制与风险)
23
+ - [引用](#引用)
24
+ - [许可证](#许可证)
25
+ - [贡献](#贡献)
26
+ - [联系方式](#联系方式)
27
+
28
+ ## 🎯 简介
29
+
30
+ **MedGo** 是一个基于 **Qwen2.5-32B** 微调的通用医疗大语言模型,专为临床医学与科研场景设计。模型通过大规模多源医学语料和复杂病例数据增强进行训练,支持医学问答、病历摘要、临床推理、多轮对话和科研文本生成等多任务能力。
31
+
32
+ ### 🌟 核心能力
33
+
34
+ - **📚 医学知识问答**: 基于权威医学文献和临床指南的专业问答
35
+ - **📝 病历文书生成**: 自动化病历摘要、诊断报告和医疗文书
36
+ - **🔍 临床推理**: 鉴别诊断、检查建议和治疗方案推荐
37
+ - **💬 多轮对话**: 医患交互模拟和复杂病例讨论
38
+ - **🔬 科研辅助**: 文献摘要、研究思路生成和质控审查
39
+
40
+ ## ✨ 模型特点
41
+
42
+ | 特性 | 详情 |
43
+ |------|------|
44
+ | **基础架构** | Qwen2.5-32B |
45
+ | **参数规模** | 32B |
46
+ | **应用领域** | 临床医学、科研辅助、医疗系统集成 |
47
+ | **微调方法** | SFT + Preference Alignment (DPO/KTO) |
48
+ | **数据来源** | 权威医学文献、临床指南、真实病例(脱敏) |
49
+ | **部署方式** | 本地部署、HIS/EMR 系统集成 |
50
+ | **开源许可** | Apache 2.0 |
51
+
52
+ ## 📊 性能评估
53
+
54
+ MedGo 在多项医学与综合评测基准上表现优异,在 30B 参数级别模型中具有竞争力:
55
+
56
+ ### 主要基准测试结果
57
+
58
+ - **AIMedQA**: 医学问答理解
59
+ - **CME**: 临床推理评估
60
+ - **DiagnosisArena**: 诊断能力测试
61
+ - **MedQA / MedMCQA**: 医学选择题
62
+ - **PubMedQA**: 生物医学文献问答
63
+ - **MMLU-Pro**: 综合能力评估
64
+
65
+ ![Performance Comparison](./main_results.png)
66
+
67
+ **性能亮点**:
68
+ - ✅ **平均得分**: 约 70 分(30B 级别模型中表现优异)
69
+ - ✅ **优势任务**: 临床推理(DiagnosisArena、CME)和多轮医学问答
70
+ - ✅ **平衡能力**: 在医疗语义理解和多任务泛化上表现良好
71
+
72
+
73
+ ## 🚀 快速开始
74
+
75
+ ### 环境要求
76
+
77
+ - Python >= 3.8
78
+ - PyTorch >= 2.0
79
+ - Transformers >= 4.35.0
80
+ - CUDA >= 11.8 (GPU 推理)
81
+
82
+ ### 安装
83
+
84
+ ```bash
85
+ # 克隆仓库
86
+ git clone https://github.com/OpenMedZoo/MedGo.git
87
+ cd MedGo
88
+
89
+ # 安装依赖
90
+ pip install -r requirements.txt
91
+ ```
92
+
93
+ ### 模型下载
94
+
95
+ 从 HuggingFace 下载模型权重:
96
+
97
+ ```bash
98
+ # 使用 huggingface-cli
99
+ huggingface-cli download OpenMedZoo/MedGo --local-dir ./models/MedGo
100
+
101
+ # 或使用 git-lfs
102
+ git lfs install
103
+ git clone https://huggingface.co/OpenMedZoo/MedGo
104
+ ```
105
+
106
+ ### 基础推理
107
+
108
+ ```python
109
+ from transformers import AutoModelForCausalLM, AutoTokenizer
110
+
111
+ # 加载模型和分词器
112
+ model_path = "OpenMedZoo/MedGo"
113
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
114
+ model = AutoModelForCausalLM.from_pretrained(
115
+ model_path,
116
+ device_map="auto",
117
+ trust_remote_code=True,
118
+ torch_dtype="auto"
119
+ )
120
+
121
+ # 医学问答示例
122
+ messages = [
123
+ {"role": "system", "content": "你是一个专业的医疗助手,请基于医学知识回答问题。"},
124
+ {"role": "user", "content": "请解释什么是高血压,以及常见的治疗方法。"}
125
+ ]
126
+
127
+ # 生成回复
128
+ inputs = tokenizer.apply_chat_template(
129
+ messages,
130
+ tokenize=True,
131
+ add_generation_prompt=True,
132
+ return_tensors="pt"
133
+ ).to(model.device)
134
+
135
+ outputs = model.generate(
136
+ inputs,
137
+ max_new_tokens=512,
138
+ temperature=0.7,
139
+ top_p=0.9,
140
+ do_sample=True
141
+ )
142
+
143
+ response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
144
+ print(response)
145
+ ```
146
+
147
+ ### 批量推理
148
+
149
+ ```bash
150
+ # 使用提供的推理脚本
151
+ python scripts/inference.py \
152
+ --model_path OpenMedZoo/MedGo \
153
+ --input_file examples/medical_qa.jsonl \
154
+ --output_file results/predictions.jsonl \
155
+ --batch_size 4
156
+ ```
157
+
158
+ ### vLLM 加速推理
159
+
160
+ ```python
161
+ from vllm import LLM, SamplingParams
162
+
163
+ # 初始化 vLLM
164
+ llm = LLM(model="OpenMedZoo/MedGo", trust_remote_code=True)
165
+ sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)
166
+
167
+ # 批量推理
168
+ prompts = [
169
+ "请解释糖尿病的症状和治疗方法。",
170
+ "高血压患者应该注意哪些饮食事项?"
171
+ ]
172
+
173
+ outputs = llm.generate(prompts, sampling_params)
174
+ for output in outputs:
175
+ print(output.outputs[0].text)
176
+ ```
177
+
178
+ ## 🔧 训练细节
179
+
180
+ MedGo 采用**两阶段微调策略**,兼顾通用医学知识���临床任务适配。
181
+
182
+ ### 阶段 I:通识医学对齐
183
+
184
+ **目标**: 建立扎实的医学知识基础,提高问答规范性
185
+
186
+ - **数据来源**:
187
+ - 权威医学文献(PubMed、医学教科书)
188
+ - 临床指南和诊疗规范
189
+ - 医学百科条目和术语库
190
+
191
+ - **训练方法**:
192
+ - Supervised Fine-Tuning (SFT)
193
+ - Chain-of-Thought (CoT) 引导样本
194
+ - 医学术语对齐和安全性约束
195
+
196
+ ### 阶段 II:临床任务增强
197
+
198
+ **目标**: 增强复杂病例推理和多任务处理能力
199
+
200
+ - **数据来源**:
201
+ - 真实病历(完全脱敏处理)
202
+ - 门急诊记录和复杂多诊断样本
203
+ - 科研文章和质控案例
204
+
205
+ - **数据增强技术**:
206
+ - 语义改写和多视角扩写
207
+ - 复杂病例合成
208
+ - 医患交互模拟
209
+
210
+ - **训练方法**:
211
+ - Multi-Task Learning(病历摘要、鉴别诊断、检查建议等)
212
+ - Preference Alignment (DPO/KTO)
213
+ - 专家反馈迭代优化
214
+
215
+ ### 训练优化重点
216
+
217
+ - ✅ 强化复杂病例的信息抽取与跨证据推理
218
+ - ✅ 提升输出的医学一致性和可解释性
219
+ - ✅ 优化表达的合规性和安全性
220
+ - ✅ 通过专家样本和自动评测持续迭代
221
+
222
+ ## 💡 使用场景
223
+
224
+ ### ✅ 适用场景
225
+
226
+ | 场景 | 说明 |
227
+ |------|------|
228
+ | **临床辅助** | 初步诊断建议、病历书写、格式化报告生成 |
229
+ | **科研支持** | 文献摘要、研究思路生成、数据分析辅助 |
230
+ | **质控审查** | 医疗文书规范性检查、诊疗流程质控 |
231
+ | **系统集成** | 嵌入 HIS/EMR 系统,提供智能辅助决策 |
232
+ | **医学教育** | 病例讨论、医学知识问答、临床推理训练 |
233
+
234
+ ### 🚫 不适用场景
235
+
236
+ - ❌ **不能替代医生**: 仅为辅助工具,不能单独作为诊断依据
237
+ - ❌ **高风险操作**: 不建议用于手术决策等高风险医疗操作
238
+ - ❌ **罕见病局限**: 对训练数据外的罕见病表现可能欠佳
239
+ - ❌ **实时急救**: 不适用于需要即时决策的急救场景
240
+
241
+ ## ⚠️ 限制与风险
242
+
243
+ ### 模型局限性
244
+
245
+ 1. **理解偏差**: 虽已覆盖大量医学知识,仍可能出现理解偏差或错误推荐
246
+ 2. **复杂病例**: 对病情复杂、并发症严重、资料缺失的病例风险较高
247
+ 3. **知识时效**: 医学知识持续更新,模型训练数据可能滞后
248
+ 4. **语言限制**: 主要针对中文医学场景,其他语言表现可能不佳
249
+
250
+ ### 使用建议
251
+
252
+ - ⚠️ 请在受控环境中使用,并由临床专家审核生成结果
253
+ - ⚠️ 将模型输出作为辅助参考,而非最终诊断依据
254
+ - ⚠️ 对敏感病案或高风险场景,必须结合专家意见
255
+ - ⚠️ 部署前需通过内部验证、安全审查和临床测试
256
+
257
+ ### 数据隐私与合规
258
+
259
+ - 🔒 训练数据已完全脱敏处理
260
+ - 🔒 使用时注意患者隐私保护
261
+ - 🔒 生产环境部署需符合医疗数据安全法规(如 HIPAA、GDPR)
262
+ - 🔒 建议在本地部署,避免敏感数据外传
263
+
264
+ ## 📚 引用
265
+
266
+ 如果 MedGo 对您的研究或项目有帮助,请引用我们的工作:
267
+
268
+ ```bibtex
269
+ @misc{openmedzoo_2025,
270
+ author = { OpenMedZoo },
271
+ title = { MedGo (Revision 640a2e2) },
272
+ year = 2025,
273
+ url = { https://huggingface.co/OpenMedZoo/MedGo },
274
+ doi = { 10.57967/hf/7024 },
275
+ publisher = { Hugging Face }
276
+ }
277
+ ```
278
+
279
+ ## 📄 许可证
280
+
281
+ 本项目采用 [Apache License 2.0](LICENSE) 开源协议。
282
+
283
+ **商业使用须知**:
284
+ - ✅ 允许商业使用和修改
285
+ - ✅ 需保留原始许可证和版权声明
286
+ - ✅ 医疗系统集成建议联系我们获取技术支持
287
+
288
+ ## 🤝 贡献
289
+
290
+ 我们欢迎社区贡献!以下是参与方式:
291
+
292
+ ### 贡献类型
293
+
294
+ - 🐛 提交 Bug 报告
295
+ - 💡 提出新功能建议
296
+ - 📝 改进文档
297
+ - 🔧 提交代码修复或优化
298
+ - 📊 分享评测结果和使用案例
299
+
300
+ ## 🙏 致谢
301
+
302
+ 感谢所有参与 MedGo 项目的人员:
303
+
304
+ - 模型研发与微调算法团队
305
+ - 数据标注与质量控制团队
306
+ - 临床专家指导与审核团队
307
+ - 开源社区的支持与反馈
308
+
309
+ 特别感谢:
310
+ - [Qwen Team](https://github.com/QwenLM/Qwen) 提供优秀的基础模型
311
+ - 所有提供数据和反馈的医疗机构
312
+
313
+ ## 📧 联系方式
314
+
315
+ - **HuggingFace**: [模型主页](https://huggingface.co/OpenMedZoo/MedGo)
316
+
317
+ ---
318
+
319
+ <div align="center">
320
+
321
+
322
+ [⬆ 回到顶部](#medgo-基于-qwen25-32b-的医疗大模型)
323
+
324
+ </div>
main_results.png ADDED

Git LFS Details

  • SHA256: 87932bc2b934dc9992d8db349cb33a2cd21dba832d7ccbdcdd358848e4a005be
  • Pointer size: 132 Bytes
  • Size of remote file: 1.61 MB