Update README.md
Browse files
README.md
CHANGED
|
@@ -28,9 +28,36 @@ model = AutoModelForCausalLM.from_pretrained("Shengkun/DarwinLM-2.7B-Pruned", tr
|
|
| 28 |
| Method | Param. | SciQ | PIQA | WG | ArcE | ArcC | HS | LogiQA | BoolQ | Avg |
|
| 29 |
|----------------------------|--------|------|------|------|------|------|------|--------|-------|------|
|
| 30 |
| **Dense** | 6.7B | 93.7 | 78.1 | 69.3 | 76.4 | 53.0 | 78.6 | 30.7 | 77.7 | 69.2 |
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
-
**(Results for 4.6B and 8.4B)**
|
| 34 |
|
| 35 |
## Installation
|
| 36 |
|
|
|
|
| 28 |
| Method | Param. | SciQ | PIQA | WG | ArcE | ArcC | HS | LogiQA | BoolQ | Avg |
|
| 29 |
|----------------------------|--------|------|------|------|------|------|------|--------|-------|------|
|
| 30 |
| **Dense** | 6.7B | 93.7 | 78.1 | 69.3 | 76.4 | 53.0 | 78.6 | 30.7 | 77.7 | 69.2 |
|
| 31 |
+
| **Uniform** | 3.4B | 44.1 | 57.1 | 53.3 | 33.5 | 32.2 | 27.3 | 25.0 | 49.0 | 40.1 |
|
| 32 |
+
| **ZipLM** | 4.0B | 87.4 | 64.4 | 58.3 | 53.2 | 33.6 | 50.1 | 25.5 | 63.6 | 54.5 |
|
| 33 |
+
| **ShearedLLama** | 2.7B | 84.5 | 66.4 | 53.4 | 49.8 | 28.4 | 47.6 | 27.6 | 50.9 | 51.0 |
|
| 34 |
+
| *DarwinLM (one-shot)* | 2.7B | 85.6 | 70.8 | 55.8 | 63.3 | 38.1 | 53.2 | 28.5 | 62.7 | 57.2 |
|
| 35 |
+
| **ShearedLLama (50B)** | 2.7B | 90.8 | 75.8 | 64.2 | 67.0 | 41.2 | 70.8 | 28.2 | 63.0 | 62.6 |
|
| 36 |
+
| **ShearedLLama (10B†)** | 2.7B | 92.0 | 73.6 | 63.1 | 69.8 | 42.0 | 64.4 | 29.0 | 62.1 | 61.9 |
|
| 37 |
+
| *DarwinLM (10B)* | 2.6B | 90.8 | 72.2 | 65.1 | 68.5 | 45.0 | 67.2 | 28.5 | 64.6 | 62.8 |
|
| 38 |
+
|
| 39 |
+
**4.6B**
|
| 40 |
+
|
| 41 |
+
| Model | Method | Param. | SciQ | PIQA | WG | ArcE | ArcC | HS | LogiQA | BoolQ | MMLU | Avg |
|
| 42 |
+
|-----------------|------------------------|--------|------|------|------|------|------|------|--------|-------|------|------|
|
| 43 |
+
| **Llama-3.1-8B** | **Dense** | 8B | 96.3 | 81.2 | 74.3 | 81.4 | 58.2 | 81.7 | 31.1 | 84.0 | 65.2 | 72.8 |
|
| 44 |
+
| | **Uniform** | 4.5B | 29.1 | 53.6 | 51.7 | 26.0 | 23.6 | 27.1 | 25.5 | 62.1 | 25.7 | 36.1 |
|
| 45 |
+
| | **ZipLM** | 6B | 65.5 | 60.6 | 56.0 | 40.2 | 34.4 | 34.4 | 28.1 | 63.0 | 27.9 | 45.7 |
|
| 46 |
+
| | *DarwinLM (one-shot)* | 4.6B | 84.9 | 69.4 | 57.3 | 59.6 | 34.2 | 44.6 | 24.1 | 62.2 | 28.5 | 51.6 |
|
| 47 |
+
| | **OLMO (2.5T)** | 7B | 92.8 | 79.4 | 70.4 | 73.3 | 44.9 | 77.1 | 27.9 | 72.5 | 28.3 | 62.9 |
|
| 48 |
+
| | *DarwinLM (10.0B)* | 4.6B | 93.2 | 74.8 | 67.4 | 73.2 | 51.6 | 71.3 | 30.7 | 71.1 | 40.6 | 63.7 |
|
| 49 |
+
|
| 50 |
+
**8.4B**
|
| 51 |
+
|
| 52 |
+
| Model | Method | Param. | SciQ | PIQA | WG | ArcE | ArcC | HS | LogiQA | BoolQ | MMLU | Avg |
|
| 53 |
+
|---------------------------|------------------------|--------|------|------|------|------|------|------|--------|-------|------|------|
|
| 54 |
+
| **Qwen-2.5-14B-Instruct** | **Dense** | 14B | 96.8 | 81.9 | 79.1 | 85.7 | 72.8 | 85.1 | 38.5 | 87.9 | 80.0 | 78.6 |
|
| 55 |
+
| | **Uniform** | 8.6B | 78.2 | 72.7 | 57.6 | 76.1 | 45.6 | 47.0 | 28.1 | 61.6 | 45.5 | 56.9 |
|
| 56 |
+
| | **ZipLM** | 8.5B | 69.0 | 66.4 | 52.8 | 60.1 | 38.3 | 43.3 | 29.6 | 60.2 | 25.0 | 49.4 |
|
| 57 |
+
| | *DarwinLM (one-shot)* | 8.4B | 84.3 | 73.9 | 60.5 | 75.7 | 48.0 | 53.3 | 29.3 | 66.9 | 43.1 | 59.4 |
|
| 58 |
+
| | **OLMO-0424 (2.05T)** | 7B | 96.1 | 80.1 | 72.1 | 73.8 | 49.2 | 78.0 | 29.3 | 80.8 | 52.1 | 67.9 |
|
| 59 |
+
| | *DarwinLM (10.0B)* | 8.4B | 89.5 | 78.1 | 70.7 | 79.6 | 57.6 | 74.9 | 33.5 | 73.9 | 57.9 | 68.4 |
|
| 60 |
|
|
|
|
| 61 |
|
| 62 |
## Installation
|
| 63 |
|