TiramisuQiao commited on
Commit
6f0b915
·
verified ·
1 Parent(s): c74374c

Update README.md

Browse files

Update model card with metrics

Files changed (1) hide show
  1. README.md +246 -3
README.md CHANGED
@@ -1,3 +1,246 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ - en
6
+ metrics:
7
+ - accuracy
8
+ base_model:
9
+ - Qwen/Qwen3-30B-A3B-Instruct-2507
10
+ pipeline_tag: text-generation
11
+ library_name: transformers
12
+ tags:
13
+ - medical
14
+ model-index:
15
+ - name: Med-Go-32B
16
+ results:
17
+ # ----------------------------------------------------
18
+ # Medical Knowledge
19
+ # ----------------------------------------------------
20
+ - task:
21
+ type: text-generation
22
+ dataset:
23
+ type: medical_eval_hle
24
+ name: Medical-Eval-HLE
25
+ metrics:
26
+ - name: accuracy
27
+ type: accuracy
28
+ value: 19.4
29
+ verified: false
30
+
31
+ - task:
32
+ type: text-generation
33
+ dataset:
34
+ type: supergpqa
35
+ name: SuperGPQA
36
+ metrics:
37
+ - name: accuracy
38
+ type: accuracy
39
+ value: 37.2
40
+ verified: false
41
+
42
+ - task:
43
+ type: text-generation
44
+ dataset:
45
+ type: medbullets
46
+ name: Medbullets
47
+ metrics:
48
+ - name: accuracy
49
+ type: accuracy
50
+ value: 64.3
51
+ verified: false
52
+
53
+ - task:
54
+ type: text-generation
55
+ dataset:
56
+ type: mmlu_pro
57
+ name: MMLU-pro
58
+ metrics:
59
+ - name: accuracy
60
+ type: accuracy
61
+ value: 74.7
62
+ verified: false
63
+
64
+ - task:
65
+ type: text-generation
66
+ dataset:
67
+ type: afrimedqa
68
+ name: AfrimedQA
69
+ metrics:
70
+ - name: accuracy
71
+ type: accuracy
72
+ value: 74.7
73
+ verified: false
74
+
75
+ - task:
76
+ type: text-generation
77
+ dataset:
78
+ type: medmcqa
79
+ name: MedMCQA
80
+ metrics:
81
+ - name: accuracy
82
+ type: accuracy
83
+ value: 68.3
84
+ verified: false
85
+
86
+ - task:
87
+ type: text-generation
88
+ dataset:
89
+ type: medqa_usmle
90
+ name: MedQA-USMLE
91
+ metrics:
92
+ - name: accuracy
93
+ type: accuracy
94
+ value: 76.8
95
+ verified: false
96
+
97
+ - task:
98
+ type: text-generation
99
+ dataset:
100
+ type: cmb
101
+ name: CMB
102
+ metrics:
103
+ - name: accuracy
104
+ type: accuracy
105
+ value: 92.5
106
+ verified: false
107
+
108
+ - task:
109
+ type: text-generation
110
+ dataset:
111
+ type: cmexam
112
+ name: CMExam
113
+ metrics:
114
+ - name: accuracy
115
+ type: accuracy
116
+ value: 87.4
117
+ verified: false
118
+
119
+ - task:
120
+ type: text-generation
121
+ dataset:
122
+ type: pubmedqa
123
+ name: PubMedQA
124
+ metrics:
125
+ - name: accuracy
126
+ type: accuracy
127
+ value: 76.6
128
+ verified: false
129
+
130
+ - task:
131
+ type: text-generation
132
+ dataset:
133
+ type: medexqa
134
+ name: MedExQA
135
+ metrics:
136
+ - name: accuracy
137
+ type: accuracy
138
+ value: 81.5
139
+ verified: false
140
+
141
+ - task:
142
+ type: text-generation
143
+ dataset:
144
+ type: explaincpe
145
+ name: ExplainCPE
146
+ metrics:
147
+ - name: accuracy
148
+ type: accuracy
149
+ value: 89.5
150
+ verified: false
151
+
152
+ - task:
153
+ type: text-generation
154
+ dataset:
155
+ type: mmlu_med
156
+ name: MMLU-Med
157
+ metrics:
158
+ - name: accuracy
159
+ type: accuracy
160
+ value: 87.4
161
+ verified: false
162
+
163
+ # ----------------------------------------------------
164
+ # Clinical Reasoning
165
+ # ----------------------------------------------------
166
+ - task:
167
+ type: text-generation
168
+ dataset:
169
+ type: medxperqa
170
+ name: MedXperQA
171
+ metrics:
172
+ - name: accuracy
173
+ type: accuracy
174
+ value: 20.7
175
+ verified: false
176
+
177
+ - task:
178
+ type: text-generation
179
+ dataset:
180
+ type: anesbench
181
+ name: AnesBench
182
+ metrics:
183
+ - name: accuracy
184
+ type: accuracy
185
+ value: 53.1
186
+ verified: false
187
+
188
+ - task:
189
+ type: text-generation
190
+ dataset:
191
+ type: diagnosisarena
192
+ name: DiagnosisArena
193
+ metrics:
194
+ - name: accuracy
195
+ type: accuracy
196
+ value: 64.4
197
+ verified: false
198
+
199
+ - task:
200
+ type: text-generation
201
+ dataset:
202
+ type: clinbench_hbp
203
+ name: Clinbench-HBP
204
+ metrics:
205
+ - name: accuracy
206
+ type: accuracy
207
+ value: 80.6
208
+ verified: false
209
+
210
+ # ----------------------------------------------------
211
+ # Medical Standard
212
+ # ----------------------------------------------------
213
+ - task:
214
+ type: text-generation
215
+ dataset:
216
+ type: medpair
217
+ name: MedPAIR
218
+ metrics:
219
+ - name: accuracy
220
+ type: accuracy
221
+ value: 32.3
222
+ verified: false
223
+
224
+ - task:
225
+ type: text-generation
226
+ dataset:
227
+ type: amqa
228
+ name: AMQA
229
+ metrics:
230
+ - name: accuracy
231
+ type: accuracy
232
+ value: 72.7
233
+ verified: false
234
+
235
+ - task:
236
+ type: text-generation
237
+ dataset:
238
+ type: medethicaleval
239
+ name: MedethicalEval
240
+ metrics:
241
+ - name: accuracy
242
+ type: accuracy
243
+ value: 92.2
244
+ verified: false
245
+
246
+ ---