Upload folder using huggingface_hub
Browse files- README.md +190 -0
- kokoro-v0_19.onnx +3 -0
- voices-v1.0.bin +3 -0
README.md
ADDED
|
@@ -0,0 +1,190 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Kokoro TTS v0.19 - Intel iGPU Optimized
|
| 2 |
+
|
| 3 |
+
## 🎙️ Professional Text-to-Speech Model
|
| 4 |
+
|
| 5 |
+
This repository contains the **Kokoro TTS v0.19** model optimized for Intel integrated GPU acceleration. Part of the Unicorn Orator platform by Magic Unicorn Unconventional Technology & Stuff Inc.
|
| 6 |
+
|
| 7 |
+
### Key Features
|
| 8 |
+
- **50+ Professional Voices**: American, British, various emotions and styles
|
| 9 |
+
- **Intel iGPU Accelerated**: 3-5x faster than CPU using OpenVINO
|
| 10 |
+
- **OpenAI API Compatible**: Drop-in replacement for OpenAI TTS
|
| 11 |
+
- **Production Ready**: Used in Unicorn Orator commercial deployments
|
| 12 |
+
|
| 13 |
+
## Model Files
|
| 14 |
+
|
| 15 |
+
| File | Size | Description |
|
| 16 |
+
|------|------|-------------|
|
| 17 |
+
| `kokoro-v0_19.onnx` | 311MB | Main TTS model (ONNX format) |
|
| 18 |
+
| `voices-v1.0.bin` | 25MB | 50+ voice embeddings |
|
| 19 |
+
| `phoneme_mapping.json` | 12KB | Text-to-phoneme vocabulary |
|
| 20 |
+
|
| 21 |
+
## Quick Start
|
| 22 |
+
|
| 23 |
+
### Using with Unicorn Orator (Recommended)
|
| 24 |
+
```bash
|
| 25 |
+
docker pull magicunicorn/unicorn-orator:intel-igpu-v1.0
|
| 26 |
+
docker run -p 8885:8880 magicunicorn/unicorn-orator:intel-igpu-v1.0
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
### Direct Python Usage
|
| 30 |
+
```python
|
| 31 |
+
import onnxruntime as ort
|
| 32 |
+
import numpy as np
|
| 33 |
+
|
| 34 |
+
# Load model with Intel iGPU optimization
|
| 35 |
+
providers = [('OpenVINOExecutionProvider', {
|
| 36 |
+
'device_type': 'GPU',
|
| 37 |
+
'precision': 'FP16'
|
| 38 |
+
})]
|
| 39 |
+
|
| 40 |
+
session = ort.InferenceSession('kokoro-v0_19.onnx', providers=providers)
|
| 41 |
+
|
| 42 |
+
# Run inference
|
| 43 |
+
outputs = session.run(None, {
|
| 44 |
+
'tokens': phoneme_ids, # Text as phoneme IDs
|
| 45 |
+
'style': voice_embedding, # 256-dim voice vector
|
| 46 |
+
'speed': np.array([1.0]) # Speech rate
|
| 47 |
+
})
|
| 48 |
+
|
| 49 |
+
audio = outputs[0] # 24kHz audio waveform
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
## Voice Embeddings
|
| 53 |
+
|
| 54 |
+
The `voices-v1.0.bin` file contains 50+ pre-trained voices:
|
| 55 |
+
|
| 56 |
+
### American Voices
|
| 57 |
+
- `af_bella` - Professional female narrator
|
| 58 |
+
- `af_sarah` - Warm, friendly tone
|
| 59 |
+
- `af_sky` - Young, energetic
|
| 60 |
+
- `am_michael` - Deep male narrator
|
| 61 |
+
- `am_adam` - Business professional
|
| 62 |
+
|
| 63 |
+
### British Voices
|
| 64 |
+
- `bf_emma` - BBC-style presenter
|
| 65 |
+
- `bm_george` - Documentary narrator
|
| 66 |
+
|
| 67 |
+
### Special Voices
|
| 68 |
+
- `af_heart` - Emotional, storytelling
|
| 69 |
+
- `am_echo` - Robotic/AI assistant
|
| 70 |
+
- And 40+ more...
|
| 71 |
+
|
| 72 |
+
## Intel iGPU Optimization
|
| 73 |
+
|
| 74 |
+
### Why Intel iGPU?
|
| 75 |
+
- **Power Efficient**: 15W TDP vs 75W+ for discrete GPUs
|
| 76 |
+
- **No Extra Hardware**: Uses integrated graphics already in Intel CPUs
|
| 77 |
+
- **Shared Memory**: Zero-copy access to system RAM
|
| 78 |
+
- **Wide Availability**: Present in most modern Intel laptops/desktops
|
| 79 |
+
|
| 80 |
+
### Supported Hardware
|
| 81 |
+
- Intel Iris Xe (96 EU) - 11th gen and newer
|
| 82 |
+
- Intel Arc iGPU (128 EU) - Meteor Lake
|
| 83 |
+
- Intel UHD Graphics (32 EU) - Budget systems
|
| 84 |
+
|
| 85 |
+
### Performance
|
| 86 |
+
On Intel Iris Xe (i7-1165G7):
|
| 87 |
+
- **Speed**: 150ms per sentence
|
| 88 |
+
- **Memory**: <500MB total
|
| 89 |
+
- **Speedup**: 3x faster than CPU
|
| 90 |
+
|
| 91 |
+
## Model Architecture
|
| 92 |
+
|
| 93 |
+
### Input Tensors
|
| 94 |
+
1. **tokens** (int64): Phoneme IDs from text
|
| 95 |
+
2. **style** (float32, 256): Voice embedding vector
|
| 96 |
+
3. **speed** (float32, 1): Speech rate multiplier (0.5-2.0)
|
| 97 |
+
|
| 98 |
+
### Output
|
| 99 |
+
- **audio** (float32): Raw waveform at 24kHz sample rate
|
| 100 |
+
|
| 101 |
+
### Technical Details
|
| 102 |
+
- **Framework**: ONNX Runtime with OpenVINO
|
| 103 |
+
- **Precision**: FP32 model, FP16 inference
|
| 104 |
+
- **Opset**: ONNX opset 20
|
| 105 |
+
- **Optimization**: Graph fusion, kernel optimization
|
| 106 |
+
|
| 107 |
+
## Installation
|
| 108 |
+
|
| 109 |
+
### Prerequisites
|
| 110 |
+
```bash
|
| 111 |
+
# Intel GPU drivers
|
| 112 |
+
sudo apt-get install intel-opencl-icd intel-level-zero-gpu level-zero
|
| 113 |
+
|
| 114 |
+
# Python packages
|
| 115 |
+
pip install onnxruntime-openvino==1.17.0
|
| 116 |
+
pip install numpy soundfile
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
## API Usage Examples
|
| 120 |
+
|
| 121 |
+
### Basic TTS
|
| 122 |
+
```python
|
| 123 |
+
from kokoro_tts import KokoroTTS
|
| 124 |
+
|
| 125 |
+
tts = KokoroTTS(device='igpu')
|
| 126 |
+
audio = tts.synthesize("Hello world!", voice="af_bella")
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
### Batch Processing
|
| 130 |
+
```python
|
| 131 |
+
texts = ["First sentence.", "Second sentence."]
|
| 132 |
+
audios = tts.batch_synthesize(texts, voice="am_michael")
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
### Custom Voice Mixing
|
| 136 |
+
```python
|
| 137 |
+
# Blend two voices
|
| 138 |
+
voice_blend = 0.7 * voices['af_bella'] + 0.3 * voices['af_sarah']
|
| 139 |
+
audio = tts.synthesize("Blended voice test", style=voice_blend)
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
## Benchmarks
|
| 143 |
+
|
| 144 |
+
### Intel iGPU vs Other Platforms
|
| 145 |
+
|
| 146 |
+
| Platform | Hardware | Latency | Power | Cost |
|
| 147 |
+
|----------|----------|---------|-------|------|
|
| 148 |
+
| Intel iGPU | Iris Xe | 150ms | 15W | Integrated |
|
| 149 |
+
| CPU | i7-1165G7 | 450ms | 35W | Integrated |
|
| 150 |
+
| NVIDIA GPU | RTX 3060 | 50ms | 170W | $300+ |
|
| 151 |
+
| Apple M1 | Neural Engine | 100ms | 10W | Integrated |
|
| 152 |
+
|
| 153 |
+
## Use Cases
|
| 154 |
+
|
| 155 |
+
- **Audiobook Narration**: Long-form content with consistent voice
|
| 156 |
+
- **Podcast Production**: Multi-speaker dialogue generation
|
| 157 |
+
- **Video Voiceovers**: Commercial and YouTube content
|
| 158 |
+
- **Accessibility**: Screen readers and assistive technology
|
| 159 |
+
- **Interactive AI**: Voice assistants and chatbots
|
| 160 |
+
|
| 161 |
+
## License
|
| 162 |
+
|
| 163 |
+
MIT License - Free for commercial use
|
| 164 |
+
|
| 165 |
+
## Citation
|
| 166 |
+
|
| 167 |
+
If you use Kokoro TTS in research:
|
| 168 |
+
```bibtex
|
| 169 |
+
@software{kokoro_tts_2024,
|
| 170 |
+
title = {Kokoro TTS v0.19 - Intel iGPU Optimized},
|
| 171 |
+
author = {Magic Unicorn Unconventional Technology & Stuff Inc},
|
| 172 |
+
year = {2024},
|
| 173 |
+
url = {https://huggingface.co/magicunicorn/kokoro-tts-intel}
|
| 174 |
+
}
|
| 175 |
+
```
|
| 176 |
+
|
| 177 |
+
## Links
|
| 178 |
+
|
| 179 |
+
- **Docker Hub**: [magicunicorn/unicorn-orator](https://hub.docker.com/r/magicunicorn/unicorn-orator)
|
| 180 |
+
- **GitHub**: [Unicorn-Orator](https://github.com/Unicorn-Commander/Unicorn-Orator)
|
| 181 |
+
- **Execution Engine**: [Unicorn-Execution-Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine)
|
| 182 |
+
|
| 183 |
+
## Support
|
| 184 |
+
|
| 185 |
+
For issues or questions:
|
| 186 |
+
- GitHub Issues: [Unicorn-Orator/issues](https://github.com/Unicorn-Commander/Unicorn-Orator/issues)
|
| 187 |
+
- HuggingFace Discussions: Enable in repo settings
|
| 188 |
+
|
| 189 |
+
---
|
| 190 |
+
*Powered by Magic Unicorn Unconventional Technology & Stuff Inc* 🦄
|
kokoro-v0_19.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dece567789190ebe987bd245d95c09d5ac86de28ff0c325c2e3faaf3de04442c
|
| 3 |
+
size 325525180
|
voices-v1.0.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d19762d46cf0e6648cb28a7711df1637aad15818185d13f4ff840d57f2f6dfed
|
| 3 |
+
size 26124436
|