XTTS-v2 Model Mirror for Quantum Sync
This is a mirror/backup of the Coqui XTTS-v2 model for use with the Quantum Sync project.
π― Purpose
This mirror serves as:
- Backup in case the original model becomes unavailable
- Faster access for Quantum Sync users
- Stable reference for production deployments
π Model Information
Original Model: coqui/XTTS-v2
Architecture: XTTS-v2 (Zero-shot multi-lingual TTS)
Model Size: ~1.87 GB
Supported Languages: 13 languages
- English (en)
- Thai (th)
- Spanish (es)
- French (fr)
- German (de)
- Italian (it)
- Portuguese (pt)
- Polish (pl)
- Turkish (tr)
- Russian (ru)
- Dutch (nl)
- Czech (cs)
- Arabic (ar)
- Chinese (zh-cn)
π Usage
With Quantum Sync (Recommended)
git clone https://github.com/Useforclaude/quantum-sync-v5.git
cd quantum-sync-v5/quantum-sync-v11-production
# Configure to use this mirror
# Edit tts_engines/xtts.py, change model_name to:
# model_name = "useclaude/quantum-sync-xtts-v2"
python main_v11.py input/file.srt \
--voice MyVoice \
--voice-sample /path/to/voice.wav \
--tts-engine xtts-v2 \
--tts-language en
Direct Usage with TTS Library
from TTS.api import TTS
# Use this mirror
tts = TTS(model_name="useclaude/quantum-sync-xtts-v2")
# Generate speech
tts.tts_to_file(
text="Hello, this is a test.",
speaker_wav="reference_voice.wav",
language="en",
file_path="output.wav"
)
Voice Cloning Example
from TTS.api import TTS
# Initialize
tts = TTS(model_name="useclaude/quantum-sync-xtts-v2")
# Clone voice from reference audio (6-30 seconds)
tts.tts_to_file(
text="The quick brown fox jumps over the lazy dog.",
speaker_wav="my_voice_sample.wav", # Your voice reference
language="en",
file_path="output_cloned.wav"
)
π Performance
From Quantum Sync Production Tests (2025-10-13):
| Metric | Value |
|---|---|
| Synthesis Speed | ~3.7 segments/minute |
| Processing Time | 17 min for 277 segments (23 min audio) |
| Duration Accuracy | ~87% audio, ~13% silence gaps |
| Timeline Drift | -1.7% (excellent) |
| Voice Quality | 8/10 |
| Cloning Accuracy | Excellent |
| VRAM Usage | 6-8 GB |
Comparison:
- XTTS-v2: 15-17 min, 8/10 quality, FREE, 87% audio
- F5-TTS: 20-25 min, 7/10 quality, FREE, 55% audio
- AWS Polly: 5 min, 9/10 quality, ~$0.06, no cloning
ποΈ Advanced Parameters
# Speed control (0.5 - 2.0)
tts.tts_to_file(
text="Hello world",
speaker_wav="voice.wav",
language="en",
speed=0.8, # Slower speech
file_path="output.wav"
)
# Temperature control (0.1 - 1.0)
tts.tts_to_file(
text="Hello world",
speaker_wav="voice.wav",
language="en",
temperature=0.75, # More expressive
file_path="output.wav"
)
π¦ Model Files
quantum-sync-xtts-v2/
βββ model.pth (1.87 GB - Neural network weights)
βββ config.json (Model configuration)
βββ vocab.json (Vocabulary for tokenization)
βββ speakers_xtts.pth (Speaker embeddings)
βββ dvae.pth (DVAE component)
βββ mel_stats.pth (Mel-spectrogram statistics)
βββ LICENSE (MPL 2.0)
βββ README.md (This file)
π License
Mozilla Public License 2.0 (MPL 2.0)
This model is licensed under the Mozilla Public License 2.0. You can:
- β Use commercially (no restrictions)
- β Modify the model
- β Distribute the model
- β Use in proprietary software
Requirements:
- Include license and copyright notice
- State changes if you modify the model
- Disclose source for modifications
Full License: LICENSE
π Attribution
Original Work:
- Project: Coqui TTS
- Model: XTTS-v2
- Authors: Coqui TTS Team
- License: Mozilla Public License 2.0
This Mirror:
- Purpose: Backup for Quantum Sync project
- Maintained by: [Your Name/Organization]
- Original Source: https://huggingface.co/coqui/XTTS-v2
All credit goes to the original Coqui TTS team. This is simply a mirror for backup and convenience.
π Documentation
Quantum Sync Documentation:
Original Documentation:
- Coqui TTS GitHub
- XTTS-v2 Paper (if available)
π Links
- This Mirror: https://huggingface.co/useclaude/quantum-sync-xtts-v2
- Original Model: https://huggingface.co/coqui/XTTS-v2
- Quantum Sync Project: https://github.com/Useforclaude/quantum-sync-v5
- TTS Library: https://github.com/coqui-ai/TTS
β οΈ Disclaimer
This is an unofficial mirror maintained for backup purposes. For the latest version and official support, please refer to the original model and Coqui TTS repository.
π Model Card
Model Description
XTTS-v2 is a state-of-the-art zero-shot multi-lingual text-to-speech model that can clone voices from short audio samples (6-30 seconds).
Key Features:
- Zero-shot voice cloning
- Multi-lingual support (13 languages)
- High-quality natural speech
- No fine-tuning required
- Commercial use allowed
Intended Use
Primary Use Cases:
- Voice cloning for content creation
- Multi-lingual speech synthesis
- Accessibility applications
- Audiobook narration
- Video dubbing
Out-of-Scope Use:
- Impersonation without consent
- Generating misleading content
- Illegal activities
Training Data
XTTS-v2 was trained on diverse multi-lingual speech data. For details, see the original model card.
Performance
See Performance section above for detailed benchmarks from Quantum Sync project.
Ethical Considerations
Voice Cloning Ethics:
- Always obtain consent before cloning someone's voice
- Clearly label AI-generated content
- Do not use for impersonation or fraud
- Follow local regulations on synthetic media
Limitations
- May not perfectly preserve all voice characteristics
- Quality varies with reference audio quality
- Requires GPU for reasonable speed
- ~6-8 GB VRAM recommended
- Some languages may have better quality than others
π οΈ Technical Specifications
Model Type: Autoregressive Transformer-based TTS
Framework: PyTorch
Input: Text + Reference Audio (6-30 sec WAV)
Output: 24kHz WAV audio
Inference Time: ~3-5 seconds per segment (GPU)
Hardware Requirements:
- GPU: NVIDIA with CUDA support
- VRAM: 6-8 GB recommended
- RAM: 16 GB
- Disk: ~2 GB for model
Software Requirements:
- Python 3.9+
- PyTorch 2.0+
- TTS library
- CUDA 11.8+ (for GPU)
π Support
For this mirror:
- Issues: Quantum Sync GitHub Issues
For original model:
- Issues: Coqui TTS GitHub Issues
Last Updated: 2025-10-13
Mirror Version: 1.0
Model Version: XTTS-v2 (Latest as of upload date)
- Downloads last month
- 36