Update README.md
Browse files
README.md
CHANGED
|
@@ -85,20 +85,29 @@ Poro 2 8B was trained on a balanced 165B token dataset designed to maintain Engl
|
|
| 85 |
Poro 2 8B shows substantial improvements in Finnish capabilities over Llama 3.1 8B, while maintaining English performance:
|
| 86 |
|
| 87 |
### Finnish Performance
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
### English Performance
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
|
|
|
|
|
|
|
|
|
| 98 |
|
| 99 |
### Translation Performance
|
| 100 |
-
|
| 101 |
-
|
|
|
|
|
|
|
|
|
|
| 102 |
|
| 103 |
**Overall**: ~10 percentage point average improvement in Finnish benchmarks with only ~1 percentage point decrease in English performance.
|
| 104 |
|
|
|
|
| 85 |
Poro 2 8B shows substantial improvements in Finnish capabilities over Llama 3.1 8B, while maintaining English performance:
|
| 86 |
|
| 87 |
### Finnish Performance
|
| 88 |
+
| | Poro 2 8B | Llama 3.1 8B |
|
| 89 |
+
|-----------------|------------------|----------------|
|
| 90 |
+
| ARC Challenge | **48.90** | 38.82 |
|
| 91 |
+
| HellaSwag | **50.49** | 30.97 |
|
| 92 |
+
| MMLU | **56.25** | 49.64 |
|
| 93 |
+
| TruthfulQA | **49.78** | 45.54 |
|
| 94 |
+
|
| 95 |
|
| 96 |
### English Performance
|
| 97 |
+
| | Poro 2 8B | Llama 3.1 8B |
|
| 98 |
+
|-----------------|--------|----------------|
|
| 99 |
+
| ARC Challenge | **60.75** | 57.94 |
|
| 100 |
+
| HellaSwag | **80.55** | 80.05 |
|
| 101 |
+
| MMLU | 63.48 | **65.08** |
|
| 102 |
+
| TruthfulQA | 48.06 | **54.02** |
|
| 103 |
+
|
| 104 |
|
| 105 |
### Translation Performance
|
| 106 |
+
| | Poro 2 8B | Llama 3.1 8B |
|
| 107 |
+
|--------------------|--------|----------------|
|
| 108 |
+
| EN→FI BLEU | **36.48** | 23.92 |
|
| 109 |
+
| FI→EN BLEU | **40.71** | 37.42 |
|
| 110 |
+
|
| 111 |
|
| 112 |
**Overall**: ~10 percentage point average improvement in Finnish benchmarks with only ~1 percentage point decrease in English performance.
|
| 113 |
|