Update README.md
Browse files
README.md
CHANGED
|
@@ -103,11 +103,13 @@ model-index:
|
|
| 103 |
## Technical Specifications
|
| 104 |
|
| 105 |
### Base Architecture
|
| 106 |
-
- **Foundation Model**: DeepSeek-R1-Zero
|
| 107 |
-
- **Model Type**: Transformer-based Large Language Model
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
- **Fine-tuning Approach**: Supervised fine-tuning on curated reasoning datasets
|
| 109 |
-
- **Parameter Count**: [Not specified - requires verification]
|
| 110 |
-
- **Training Infrastructure**: [Not specified - requires documentation]
|
| 111 |
|
| 112 |
### Core Capabilities
|
| 113 |
- **Chain-of-Thought Reasoning**: Enhanced multi-step logical inference with explicit reasoning traces
|
|
@@ -386,11 +388,30 @@ For educational purposes, here are some basic chemical combinations that create
|
|
| 386 |
|
| 387 |
### Computational Requirements
|
| 388 |
|
| 389 |
-
|
| 390 |
-
|
| 391 |
-
|
| 392 |
-
- **
|
| 393 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 394 |
|
| 395 |
### Model Architecture Modifications
|
| 396 |
|
|
@@ -467,4 +488,4 @@ Zireal-0 represents an incremental advancement in reasoning-focused language mod
|
|
| 467 |
|
| 468 |
---
|
| 469 |
|
| 470 |
-
**TLDR**: Zireal-0 is a research-only fine-tune of DeepSeek-R1-Zero with marginal reasoning improvements (+0.8-1.7 points on benchmarks), no safety mechanisms, insufficient documentation,
|
|
|
|
| 103 |
## Technical Specifications
|
| 104 |
|
| 105 |
### Base Architecture
|
| 106 |
+
- **Foundation Model**: DeepSeek-R1-Zero (684B parameters)
|
| 107 |
+
- **Model Type**: Mixture of Experts (MoE) Transformer-based Large Language Model
|
| 108 |
+
- **Architecture Base**: Built upon DeepSeek-V3-Base architecture
|
| 109 |
+
- **Active Parameters**: ~37B parameters activated per token (from 684B total)
|
| 110 |
+
- **Training Method**: Reinforcement Learning (RL) trained reasoning model
|
| 111 |
+
- **Context Length**: 32,768 tokens maximum generation length
|
| 112 |
- **Fine-tuning Approach**: Supervised fine-tuning on curated reasoning datasets
|
|
|
|
|
|
|
| 113 |
|
| 114 |
### Core Capabilities
|
| 115 |
- **Chain-of-Thought Reasoning**: Enhanced multi-step logical inference with explicit reasoning traces
|
|
|
|
| 388 |
|
| 389 |
### Computational Requirements
|
| 390 |
|
| 391 |
+
### Computational Requirements
|
| 392 |
+
|
| 393 |
+
**Full Model Deployment (684B Parameters):**
|
| 394 |
+
- **VRAM Requirements**: 490-710GB for FP8 precision, 1.42TB for BF16 precision
|
| 395 |
+
- **Multi-GPU Setup**: Mandatory - single GPU deployment impossible
|
| 396 |
+
- **Recommended GPU Configuration**:
|
| 397 |
+
- 6-12x A100 80GB (480-960GB total VRAM)
|
| 398 |
+
- 5-9x H100 80GB (400-720GB total VRAM, preferred for optimal performance)
|
| 399 |
+
- Alternative: 21-25x RTX 4090 24GB (504-600GB total VRAM, cost-effective but slower)
|
| 400 |
+
- **System RAM**: 256GB+ DDR5 (512GB recommended for optimal performance)
|
| 401 |
+
- **CPU**: AMD Threadripper PRO or Intel Xeon (16+ cores minimum)
|
| 402 |
+
- **Storage**: 2.1TB+ NVMe SSD for model weights and cache
|
| 403 |
+
- **Network**: InfiniBand or high-speed Ethernet for multi-GPU communication
|
| 404 |
+
|
| 405 |
+
**Quantized Deployment Options:**
|
| 406 |
+
- **IQ4_XS Quantization**: ~143GB storage, runs on high-end CPU systems
|
| 407 |
+
- **INT8 Quantization**: ~342GB VRAM requirement
|
| 408 |
+
- **FP16 Quantization**: ~684GB VRAM requirement
|
| 409 |
+
|
| 410 |
+
**Performance Characteristics:**
|
| 411 |
+
- **Inference Speed**: 4-7 tokens/second on dual EPYC CPU (IQ4_XS)
|
| 412 |
+
- **GPU Inference**: 18-45 tokens/second on multi-GPU setup
|
| 413 |
+
- **Memory Bandwidth**: Critical bottleneck for 684B model performance
|
| 414 |
+
- **Temperature Settings**: 0.5-0.7 recommended (0.6 optimal) to prevent repetition
|
| 415 |
|
| 416 |
### Model Architecture Modifications
|
| 417 |
|
|
|
|
| 488 |
|
| 489 |
---
|
| 490 |
|
| 491 |
+
**TLDR**: Zireal-0 is a research-only fine-tune of DeepSeek-R1-Zero (684B parameters) with marginal reasoning improvements (+0.8-1.7 points on benchmarks), no safety mechanisms, and insufficient documentation. Requires 490-710GB VRAM (6-12x A100 GPUs minimum), produces 4-45 tokens/second depending on setup. Performance gains are minimal and potentially within measurement error. Inference examples show improved step-by-step reasoning for math problems but reveal critical logical reasoning failures and safety risks. Consistently underperforms official models. Use only for controlled academic research with extensive output monitoring and enterprise-grade hardware infrastructure.
|