Update README.md
Browse files
README.md
CHANGED
|
@@ -76,10 +76,10 @@ Our evaluation is based on the framework lm-evaluation-harness and opencompass.
|
|
| 76 |
- Huggingface LLM Leaderboard tasks.
|
| 77 |
- Other Popular Benchmarks: We report the average accuracies on Big Bench Hard (BBH) (3-shot), HumanEval.
|
| 78 |
|
| 79 |
-
|
|
| 80 |
| ------- | ------ | ---------- | ---------- | --------- | ------ | ------ | --------- | ---- | ------- |
|
| 81 |
-
|
|
| 82 |
-
| Mistral |
|
| 83 |
|
| 84 |
## Inference Speed Evaluation Results
|
| 85 |
|
|
|
|
| 76 |
- Huggingface LLM Leaderboard tasks.
|
| 77 |
- Other Popular Benchmarks: We report the average accuracies on Big Bench Hard (BBH) (3-shot), HumanEval.
|
| 78 |
|
| 79 |
+
| | Average | MMLU | Winogrande | TruthfulQA | Hellaswag | GSM8K | Arc-C | HumanEval | BBH |
|
| 80 |
| ------- | ------ | ---------- | ---------- | --------- | ------ | ------ | --------- | ---- | ------- |
|
| 81 |
+
| Bamboo | **57.1** | 63.89 | 76.16 | 44.06 | 82.17 | 52.84 | 62.20 | 25.6 | 50.35 |
|
| 82 |
+
| Mistral-v0.1 | **56.5** | 62.65 | 79.24 | 42.62 | 83.32 | 40.18 | 61.43 | 26.21 | 56.35 |
|
| 83 |
|
| 84 |
## Inference Speed Evaluation Results
|
| 85 |
|