# Kokoro TTS v0.19 - Intel iGPU Optimized

## 🎙️ Professional Text-to-Speech Model

This repository contains the **Kokoro TTS v0.19** model optimized for Intel integrated GPU acceleration. Part of the Unicorn Orator platform by Magic Unicorn Unconventional Technology & Stuff Inc.

### Key Features
- **50+ Professional Voices**: American, British, various emotions and styles
- **Intel iGPU Accelerated**: 3-5x faster than CPU using OpenVINO
- **OpenAI API Compatible**: Drop-in replacement for OpenAI TTS
- **Production Ready**: Used in Unicorn Orator commercial deployments

## Model Files

| File | Size | Description |
|------|------|-------------|
| `kokoro-v0_19.onnx` | 311MB | Main TTS model (ONNX format) |
| `voices-v1.0.bin` | 25MB | 50+ voice embeddings |
| `phoneme_mapping.json` | 12KB | Text-to-phoneme vocabulary |

## Quick Start

### Using with Unicorn Orator (Recommended)
```bash
docker pull magicunicorn/unicorn-orator:intel-igpu-v1.0
docker run -p 8885:8880 magicunicorn/unicorn-orator:intel-igpu-v1.0
```

### Direct Python Usage
```python
import onnxruntime as ort
import numpy as np

# Load model with Intel iGPU optimization
providers = [('OpenVINOExecutionProvider', {
    'device_type': 'GPU',
    'precision': 'FP16'
})]

session = ort.InferenceSession('kokoro-v0_19.onnx', providers=providers)

# Run inference
outputs = session.run(None, {
    'tokens': phoneme_ids,  # Text as phoneme IDs
    'style': voice_embedding,  # 256-dim voice vector
    'speed': np.array([1.0])  # Speech rate
})

audio = outputs[0]  # 24kHz audio waveform
```

## Voice Embeddings

The `voices-v1.0.bin` file contains 50+ pre-trained voices:

### American Voices
- `af_bella` - Professional female narrator
- `af_sarah` - Warm, friendly tone
- `af_sky` - Young, energetic
- `am_michael` - Deep male narrator
- `am_adam` - Business professional

### British Voices
- `bf_emma` - BBC-style presenter
- `bm_george` - Documentary narrator

### Special Voices
- `af_heart` - Emotional, storytelling
- `am_echo` - Robotic/AI assistant
- And 40+ more...

## Intel iGPU Optimization

### Why Intel iGPU?
- **Power Efficient**: 15W TDP vs 75W+ for discrete GPUs
- **No Extra Hardware**: Uses integrated graphics already in Intel CPUs
- **Shared Memory**: Zero-copy access to system RAM
- **Wide Availability**: Present in most modern Intel laptops/desktops

### Supported Hardware
- Intel Iris Xe (96 EU) - 11th gen and newer
- Intel Arc iGPU (128 EU) - Meteor Lake
- Intel UHD Graphics (32 EU) - Budget systems

### Performance
On Intel Iris Xe (i7-1165G7):
- **Speed**: 150ms per sentence
- **Memory**: <500MB total
- **Speedup**: 3x faster than CPU

## Model Architecture

### Input Tensors
1. **tokens** (int64): Phoneme IDs from text
2. **style** (float32, 256): Voice embedding vector  
3. **speed** (float32, 1): Speech rate multiplier (0.5-2.0)

### Output
- **audio** (float32): Raw waveform at 24kHz sample rate

### Technical Details
- **Framework**: ONNX Runtime with OpenVINO
- **Precision**: FP32 model, FP16 inference
- **Opset**: ONNX opset 20
- **Optimization**: Graph fusion, kernel optimization

## Installation

### Prerequisites
```bash
# Intel GPU drivers
sudo apt-get install intel-opencl-icd intel-level-zero-gpu level-zero

# Python packages
pip install onnxruntime-openvino==1.17.0
pip install numpy soundfile
```

## API Usage Examples

### Basic TTS
```python
from kokoro_tts import KokoroTTS

tts = KokoroTTS(device='igpu')
audio = tts.synthesize("Hello world!", voice="af_bella")
```

### Batch Processing
```python
texts = ["First sentence.", "Second sentence."]
audios = tts.batch_synthesize(texts, voice="am_michael")
```

### Custom Voice Mixing
```python
# Blend two voices
voice_blend = 0.7 * voices['af_bella'] + 0.3 * voices['af_sarah']
audio = tts.synthesize("Blended voice test", style=voice_blend)
```

## Benchmarks

### Intel iGPU vs Other Platforms

| Platform | Hardware | Latency | Power | Cost |
|----------|----------|---------|-------|------|
| Intel iGPU | Iris Xe | 150ms | 15W | Integrated |
| CPU | i7-1165G7 | 450ms | 35W | Integrated |
| NVIDIA GPU | RTX 3060 | 50ms | 170W | $300+ |
| Apple M1 | Neural Engine | 100ms | 10W | Integrated |

## Use Cases

- **Audiobook Narration**: Long-form content with consistent voice
- **Podcast Production**: Multi-speaker dialogue generation
- **Video Voiceovers**: Commercial and YouTube content
- **Accessibility**: Screen readers and assistive technology
- **Interactive AI**: Voice assistants and chatbots

## License

MIT License - Free for commercial use

## Citation

If you use Kokoro TTS in research:
```bibtex
@software{kokoro_tts_2024,
  title = {Kokoro TTS v0.19 - Intel iGPU Optimized},
  author = {Magic Unicorn Unconventional Technology & Stuff Inc},
  year = {2024},
  url = {https://huggingface.co/magicunicorn/kokoro-tts-intel}
}
```

## Links

- **Docker Hub**: [magicunicorn/unicorn-orator](https://hub.docker.com/r/magicunicorn/unicorn-orator)
- **GitHub**: [Unicorn-Orator](https://github.com/Unicorn-Commander/Unicorn-Orator)
- **Execution Engine**: [Unicorn-Execution-Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine)

## Support

For issues or questions:
- GitHub Issues: [Unicorn-Orator/issues](https://github.com/Unicorn-Commander/Unicorn-Orator/issues)
- HuggingFace Discussions: Enable in repo settings

---
*Powered by Magic Unicorn Unconventional Technology & Stuff Inc* 🦄