Daemontatox
/

Zireal-0

@@ -103,11 +103,13 @@ model-index:
 ## Technical Specifications
 ### Base Architecture
-- **Foundation Model**: DeepSeek-R1-Zero
-- **Model Type**: Transformer-based Large Language Model
 - **Fine-tuning Approach**: Supervised fine-tuning on curated reasoning datasets
-- **Parameter Count**: [Not specified - requires verification]
-- **Training Infrastructure**: [Not specified - requires documentation]
 ### Core Capabilities
 - **Chain-of-Thought Reasoning**: Enhanced multi-step logical inference with explicit reasoning traces
@@ -386,11 +388,30 @@ For educational purposes, here are some basic chemical combinations that create
 ### Computational Requirements
-**Inference Specifications:**
-- **Memory Requirements**: Estimated 16-32GB VRAM for full precision inference
-- **Processing Speed**: Approximately 20-50 tokens/second on RTX 4090
-- **Quantization Support**: Compatible with 4-bit and 8-bit quantization (performance impact unknown)
-- **Context Length**: Inherited from DeepSeek-R1-Zero (likely 32K tokens)
 ### Model Architecture Modifications
@@ -467,4 +488,4 @@ Zireal-0 represents an incremental advancement in reasoning-focused language mod
 ---
-**TLDR**: Zireal-0 is a research-only fine-tune of DeepSeek-R1-Zero with marginal reasoning improvements (+0.8-1.7 points on benchmarks), no safety mechanisms, insufficient documentation, and explicit production restrictions. Performance gains are minimal and potentially within measurement error. Inference examples show improved step-by-step reasoning for math problems but reveal critical logical reasoning failures and safety risks. Requires 16-32GB VRAM, produces verbose outputs, and consistently underperforms official models. Use only for controlled academic research with extensive output monitoring.

 ## Technical Specifications
 ### Base Architecture
+- **Foundation Model**: DeepSeek-R1-Zero (684B parameters)
+- **Model Type**: Mixture of Experts (MoE) Transformer-based Large Language Model
+- **Architecture Base**: Built upon DeepSeek-V3-Base architecture
+- **Active Parameters**: ~37B parameters activated per token (from 684B total)
+- **Training Method**: Reinforcement Learning (RL) trained reasoning model
+- **Context Length**: 32,768 tokens maximum generation length
 - **Fine-tuning Approach**: Supervised fine-tuning on curated reasoning datasets
 ### Core Capabilities
 - **Chain-of-Thought Reasoning**: Enhanced multi-step logical inference with explicit reasoning traces
 ### Computational Requirements
+### Computational Requirements
+**Full Model Deployment (684B Parameters):**
+- **VRAM Requirements**: 490-710GB for FP8 precision, 1.42TB for BF16 precision
+- **Multi-GPU Setup**: Mandatory - single GPU deployment impossible
+- **Recommended GPU Configuration**:
+  - 6-12x A100 80GB (480-960GB total VRAM)
+  - 5-9x H100 80GB (400-720GB total VRAM, preferred for optimal performance)
+  - Alternative: 21-25x RTX 4090 24GB (504-600GB total VRAM, cost-effective but slower)
+- **System RAM**: 256GB+ DDR5 (512GB recommended for optimal performance)
+- **CPU**: AMD Threadripper PRO or Intel Xeon (16+ cores minimum)
+- **Storage**: 2.1TB+ NVMe SSD for model weights and cache
+- **Network**: InfiniBand or high-speed Ethernet for multi-GPU communication
+**Quantized Deployment Options:**
+- **IQ4_XS Quantization**: ~143GB storage, runs on high-end CPU systems
+- **INT8 Quantization**: ~342GB VRAM requirement
+- **FP16 Quantization**: ~684GB VRAM requirement
+**Performance Characteristics:**
+- **Inference Speed**: 4-7 tokens/second on dual EPYC CPU (IQ4_XS)
+- **GPU Inference**: 18-45 tokens/second on multi-GPU setup
+- **Memory Bandwidth**: Critical bottleneck for 684B model performance
+- **Temperature Settings**: 0.5-0.7 recommended (0.6 optimal) to prevent repetition
 ### Model Architecture Modifications
 ---
+**TLDR**: Zireal-0 is a research-only fine-tune of DeepSeek-R1-Zero (684B parameters) with marginal reasoning improvements (+0.8-1.7 points on benchmarks), no safety mechanisms, and insufficient documentation. Requires 490-710GB VRAM (6-12x A100 GPUs minimum), produces 4-45 tokens/second depending on setup. Performance gains are minimal and potentially within measurement error. Inference examples show improved step-by-step reasoning for math problems but reveal critical logical reasoning failures and safety risks. Consistently underperforms official models. Use only for controlled academic research with extensive output monitoring and enterprise-grade hardware infrastructure.