# Kokoro TTS v0.19 - Intel iGPU Optimized ## 🎙️ Professional Text-to-Speech Model This repository contains the **Kokoro TTS v0.19** model optimized for Intel integrated GPU acceleration. Part of the Unicorn Orator platform by Magic Unicorn Unconventional Technology & Stuff Inc. ### Key Features - **50+ Professional Voices**: American, British, various emotions and styles - **Intel iGPU Accelerated**: 3-5x faster than CPU using OpenVINO - **OpenAI API Compatible**: Drop-in replacement for OpenAI TTS - **Production Ready**: Used in Unicorn Orator commercial deployments ## Model Files | File | Size | Description | |------|------|-------------| | `kokoro-v0_19.onnx` | 311MB | Main TTS model (ONNX format) | | `voices-v1.0.bin` | 25MB | 50+ voice embeddings | | `phoneme_mapping.json` | 12KB | Text-to-phoneme vocabulary | ## Quick Start ### Using with Unicorn Orator (Recommended) ```bash docker pull magicunicorn/unicorn-orator:intel-igpu-v1.0 docker run -p 8885:8880 magicunicorn/unicorn-orator:intel-igpu-v1.0 ``` ### Direct Python Usage ```python import onnxruntime as ort import numpy as np # Load model with Intel iGPU optimization providers = [('OpenVINOExecutionProvider', { 'device_type': 'GPU', 'precision': 'FP16' })] session = ort.InferenceSession('kokoro-v0_19.onnx', providers=providers) # Run inference outputs = session.run(None, { 'tokens': phoneme_ids, # Text as phoneme IDs 'style': voice_embedding, # 256-dim voice vector 'speed': np.array([1.0]) # Speech rate }) audio = outputs[0] # 24kHz audio waveform ``` ## Voice Embeddings The `voices-v1.0.bin` file contains 50+ pre-trained voices: ### American Voices - `af_bella` - Professional female narrator - `af_sarah` - Warm, friendly tone - `af_sky` - Young, energetic - `am_michael` - Deep male narrator - `am_adam` - Business professional ### British Voices - `bf_emma` - BBC-style presenter - `bm_george` - Documentary narrator ### Special Voices - `af_heart` - Emotional, storytelling - `am_echo` - Robotic/AI assistant - And 40+ more... ## Intel iGPU Optimization ### Why Intel iGPU? - **Power Efficient**: 15W TDP vs 75W+ for discrete GPUs - **No Extra Hardware**: Uses integrated graphics already in Intel CPUs - **Shared Memory**: Zero-copy access to system RAM - **Wide Availability**: Present in most modern Intel laptops/desktops ### Supported Hardware - Intel Iris Xe (96 EU) - 11th gen and newer - Intel Arc iGPU (128 EU) - Meteor Lake - Intel UHD Graphics (32 EU) - Budget systems ### Performance On Intel Iris Xe (i7-1165G7): - **Speed**: 150ms per sentence - **Memory**: <500MB total - **Speedup**: 3x faster than CPU ## Model Architecture ### Input Tensors 1. **tokens** (int64): Phoneme IDs from text 2. **style** (float32, 256): Voice embedding vector 3. **speed** (float32, 1): Speech rate multiplier (0.5-2.0) ### Output - **audio** (float32): Raw waveform at 24kHz sample rate ### Technical Details - **Framework**: ONNX Runtime with OpenVINO - **Precision**: FP32 model, FP16 inference - **Opset**: ONNX opset 20 - **Optimization**: Graph fusion, kernel optimization ## Installation ### Prerequisites ```bash # Intel GPU drivers sudo apt-get install intel-opencl-icd intel-level-zero-gpu level-zero # Python packages pip install onnxruntime-openvino==1.17.0 pip install numpy soundfile ``` ## API Usage Examples ### Basic TTS ```python from kokoro_tts import KokoroTTS tts = KokoroTTS(device='igpu') audio = tts.synthesize("Hello world!", voice="af_bella") ``` ### Batch Processing ```python texts = ["First sentence.", "Second sentence."] audios = tts.batch_synthesize(texts, voice="am_michael") ``` ### Custom Voice Mixing ```python # Blend two voices voice_blend = 0.7 * voices['af_bella'] + 0.3 * voices['af_sarah'] audio = tts.synthesize("Blended voice test", style=voice_blend) ``` ## Benchmarks ### Intel iGPU vs Other Platforms | Platform | Hardware | Latency | Power | Cost | |----------|----------|---------|-------|------| | Intel iGPU | Iris Xe | 150ms | 15W | Integrated | | CPU | i7-1165G7 | 450ms | 35W | Integrated | | NVIDIA GPU | RTX 3060 | 50ms | 170W | $300+ | | Apple M1 | Neural Engine | 100ms | 10W | Integrated | ## Use Cases - **Audiobook Narration**: Long-form content with consistent voice - **Podcast Production**: Multi-speaker dialogue generation - **Video Voiceovers**: Commercial and YouTube content - **Accessibility**: Screen readers and assistive technology - **Interactive AI**: Voice assistants and chatbots ## License MIT License - Free for commercial use ## Citation If you use Kokoro TTS in research: ```bibtex @software{kokoro_tts_2024, title = {Kokoro TTS v0.19 - Intel iGPU Optimized}, author = {Magic Unicorn Unconventional Technology & Stuff Inc}, year = {2024}, url = {https://huggingface.co/magicunicorn/kokoro-tts-intel} } ``` ## Links - **Docker Hub**: [magicunicorn/unicorn-orator](https://hub.docker.com/r/magicunicorn/unicorn-orator) - **GitHub**: [Unicorn-Orator](https://github.com/Unicorn-Commander/Unicorn-Orator) - **Execution Engine**: [Unicorn-Execution-Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine) ## Support For issues or questions: - GitHub Issues: [Unicorn-Orator/issues](https://github.com/Unicorn-Commander/Unicorn-Orator/issues) - HuggingFace Discussions: Enable in repo settings --- *Powered by Magic Unicorn Unconventional Technology & Stuff Inc* 🦄