typo fix
Browse files
README.md
CHANGED
|
@@ -272,7 +272,7 @@ Granite-3.0-3B-A800M-Base is based on a decoder-only sparse Mixture of Experts (
|
|
| 272 |
| Initialization std | 0.1 | 0.1 | 0.1 | **0.1** |
|
| 273 |
| Sequence Length | 4096 | 4096 | 4096 | **4096** |
|
| 274 |
| Position Embedding | RoPE | RoPE | RoPE | **RoPE** |
|
| 275 |
-
| #
|
| 276 |
| # Active Parameters | 2.5B | 8.1B | 400M | **800M** |
|
| 277 |
| # Training tokens | 12T | 12T | 10T | **10T** |
|
| 278 |
|
|
|
|
| 272 |
| Initialization std | 0.1 | 0.1 | 0.1 | **0.1** |
|
| 273 |
| Sequence Length | 4096 | 4096 | 4096 | **4096** |
|
| 274 |
| Position Embedding | RoPE | RoPE | RoPE | **RoPE** |
|
| 275 |
+
| # Parameters | 2.5B | 8.1B | 1.3B | **3.3B** |
|
| 276 |
| # Active Parameters | 2.5B | 8.1B | 400M | **800M** |
|
| 277 |
| # Training tokens | 12T | 12T | 10T | **10T** |
|
| 278 |
|