--- license: apache-2.0 title: ebookChatterBox sdk: gradio sdk_version: 5.34.0 --- # ๐ŸŽง Chatterbox Audiobook Generator **This is a work in progress. You can consider this a pre-launch repo at the moment, but if you find bugs, please put them in the issues area. Thank you.** **Transform your text into high-quality audiobooks with advanced TTS models, voice cloning, and professional volume normalization.** ## ๐Ÿš€ Quick Start ### 1. Install Dependencies ```bash ./install-audiobook.bat ``` ### 2. Launch the Application ```bash ./launch_audiobook.bat ``` ### 3. CUDA Issue Fix (If Needed) If you encounter CUDA assertion errors during generation, install the patched version: ```bash # Activate your virtual environment first venv\Scripts\activate.bat # Install the CUDA-fixed version pip install --force-reinstall --no-cache-dir "chatterbox-tts @ git+https://github.com/fakerybakery/better-chatterbox@fix-cuda-issue" ``` The web interface will open automatically in your browser at `http://localhost:7860` --- ## โœจ Features ### ๐Ÿ“š **Audiobook Creation** - **Single Voice**: Generate entire audiobooks with one consistent voice - **Multi-Voice**: Create dynamic audiobooks with multiple characters - **Custom Voices**: Clone voices from audio samples for personalized narration - **Professional Volume Normalization**: Ensure consistent audio levels across all voices - **๐Ÿ“‹ Text Queuing System** โญ *NEW*: Upload books in any size chapters and generate continuously - **๐Ÿ”„ Chunk-Based Processing** โญ *NEW*: Improved reliability for longer text generations ### ๐ŸŽต **Audio Processing** - **Smart Cleanup**: Remove unwanted silence and audio artifacts - **Volume Normalization**: Professional-grade volume balancing for all voices - **Real-time Audio Analysis**: Live volume level monitoring and feedback - **Preview System**: Test settings before applying to entire projects - **Batch Processing**: Process multiple projects efficiently - **Quality Control**: Advanced audio optimization tools - **๐ŸŽฏ Enhanced Audio Quality** โญ *NEW*: Improved P-top and minimum P parameters for better voice generation ### ๐ŸŽญ **Voice Management** - **Voice Library**: Organize and manage your voice collection - **Voice Cloning**: Create custom voices from audio samples - **Volume Settings**: Configure target volume levels for each voice - **Professional Presets**: Industry-standard volume levels (audiobook, podcast, broadcast) - **Character Assignment**: Map specific voices to story characters ### ๐Ÿ“Š **Volume Normalization System** โญ *NEW* - **Professional Standards**: Audiobook (-18 dB), Podcast (-16 dB), Broadcast (-23 dB) presets - **Consistent Character Voices**: All characters maintain the same volume level - **Real-time Analysis**: Color-coded volume status with RMS and peak level display - **Retroactive Normalization**: Apply volume settings to existing voice projects - **Multi-Voice Support**: Batch normalize all voices in multi-character audiobooks - **Soft Limiting**: Intelligent audio limiting to prevent distortion ### ๐Ÿ“– **Text Processing** - **Chapter Support**: Automatic chapter detection and organization - **Multi-Voice Parsing**: Parse character dialogue automatically - **Text Validation**: Ensure proper formatting before generation - **๐Ÿ“‹ Queue Management** โญ *NEW*: Batch process multiple text files sequentially - **๐Ÿ”‡ Return Pause System** โญ *NEW*: Automatic pause insertion based on line breaks for natural speech flow --- ## ๐ŸŽญ Custom Audiobook Processing Pipeline โญ *NEW* Our advanced text processing pipeline transforms your written content into natural-sounding audiobooks with intelligent pause placement and character flow management. ### ๐Ÿ”‡ **Return Pause System** **Automatic pause insertion based on your text formatting** - Every line break (`\n`) in your text automatically adds a 0.1-second pause to the generated audio, creating natural speech rhythms without manual intervention. #### **How It Works** - **Line Break Detection**: System automatically counts all line breaks in your text - **Pause Calculation**: Each return adds exactly 0.1 seconds of silence - **Accumulative Pauses**: Multiple consecutive line breaks create longer pauses - **Universal Support**: Works with single-voice, multi-voice, and batch processing #### **Example Text Formatting** ``` [Narrator] The sun was setting over the hills. [Character1] "We need to find shelter soon." [Character2] "I see a cave up ahead. Let's hurry before it gets dark." [Narrator] They rushed toward the cave, hearts pounding. ``` **Result**: Natural pauses between dialogue, emphasis pauses for dramatic effect, and smooth character transitions. ### ๐Ÿ“ **Text Formatting Best Practices** #### **๐ŸŽญ Multi-Voice Dialogue Structure** ``` [Character Name] Dialogue content here. [Another Character] Response content here. Multiple lines can be used for the same character. [Narrator] Descriptive text and scene setting. ``` #### **๐ŸŽช Natural Flow Techniques** - **Paragraph Breaks**: Use double line breaks for scene transitions - **Emphasis Pauses**: Add extra returns before important revelations - **Character Separation**: Single returns between different speakers - **Breathing Room**: Natural pauses for complex concepts or emotional moments #### **๐Ÿ“– Single Voice Formatting** ``` Chapter content flows naturally here. New paragraphs create natural pauses. Extended pauses can emphasize dramatic moments. Regular text continues with normal pacing. ``` ### ๐Ÿ”„ **Processing Pipeline Features** #### **๐Ÿง  Intelligent Text Analysis** - **Line Break Preservation**: Maintains your formatting intentions throughout processing - **Character Assignment**: Automatically maps voice tags to selected voice profiles - **Chunk Optimization**: Breaks long texts into optimal segments while preserving pause timing - **Error Recovery**: Validates text and provides helpful formatting suggestions #### **โšก Real-Time Processing** - **Live Feedback**: Console output shows exactly how many pauses are being added - **Debug Information**: Detailed logging of pause detection and application - **Progress Tracking**: Monitor pause processing alongside audio generation - **Quality Assurance**: Automatic validation of pause placement #### **๐ŸŽš๏ธ Professional Output** - **Seamless Integration**: Pauses blend naturally with generated speech - **Volume Consistency**: Silence segments match the audio output specifications - **Format Compatibility**: Works with all supported audio formats and quality settings - **Project Preservation**: Pause information saved in project metadata for regeneration ### ๐Ÿ’ก **Pro Tips for Better Audiobooks** #### **๐ŸŽฏ Dialogue Formatting** - **Character Consistency**: Always use the same character name format `[Name]` - **Natural Breaks**: Place returns where a human reader would naturally pause - **Scene Transitions**: Use multiple returns (2-3) for major scene changes - **Emotional Beats**: Add single returns before/after emotional dialogue #### **๐Ÿ“š Chapter Structure** ``` Chapter 1: The Beginning Opening paragraph with scene setting. "Character dialogue with natural flow." Descriptive narrative continues. Major scene transition with extended pause. New section begins here. ``` #### **๐ŸŽช Advanced Techniques** - **Cliffhangers**: Use extended pauses before revealing crucial information - **Action Sequences**: Shorter, punchy sentences with minimal pauses for intensity - **Contemplative Moments**: Longer pauses for reflection and character development - **Comedic Timing**: Strategic pauses before punchlines or comedic reveals ### ๐Ÿ” **Debug Output Examples** When generating your audiobook, watch for these helpful console messages: ``` ๐Ÿ”‡ Detected 15 line breaks โ†’ 1.5s total pause time ๐Ÿ”‡ Line breaks detected in [Character1]: +0.3s pause (from 3 returns) ๐Ÿ”‡ Chunk 2 (Narrator): Added 0.2s pause after speech ``` This real-time feedback helps you understand exactly how your formatting translates to audio timing. --- ## ๐Ÿ†• Recent Improvements ### ๐ŸŽฏ **Audio Quality Enhancements** We've significantly improved audio generation quality by optimizing the underlying TTS parameters: - **Enhanced P-top and Minimum P Settings**: Fine-tuned probability parameters for more natural speech patterns - **Reduced Audio Artifacts**: Better handling of pronunciation and intonation - **Improved Voice Consistency**: More stable voice characteristics across long generations - **Better Pronunciation**: Enhanced handling of complex words and names **๐Ÿ“ Note for Existing Users**: - Older voice profiles will continue to work as before - To take advantage of the new audio quality improvements, consider re-creating voice profiles - Existing projects remain fully compatible ### ๐Ÿ“‹ **Text Queuing System** Perfect for processing large books or multiple chapters: - **Batch Upload**: Upload multiple text files of any size - **Sequential Processing**: Automatically processes files one after another - **Progress Tracking**: Monitor generation progress across all queued items - **Flexible Chapter Sizes**: No restrictions on individual file length - **Unattended Generation**: Set up large projects and let them run automatically ### ๐Ÿ”„ **Chunk-Based TTS System** Enhanced the core text-to-speech engine for better reliability: - **Background Chunking**: Automatically splits long texts into optimal chunks - **Memory Management**: Better handling of large text inputs - **Error Recovery**: Improved resilience during long generation sessions - **Consistent Quality**: Maintains voice quality across chunk boundaries - **Progress Feedback**: Real-time updates on generation progress --- ## ๐ŸŽš๏ธ Volume Normalization Guide ### **Individual Voice Setup** 1. Go to **Voice Library** tab 2. Upload your voice sample and configure settings 3. Set target volume level (default: -18 dB for audiobooks) 4. Choose from professional presets or use custom levels 5. Save voice profile with volume settings ### **Multi-Voice Projects** 1. Navigate to **Multi-Voice Audiobook Creation** tab 2. Enable volume normalization for all voices 3. Set target level for consistent character voices 4. All characters will be automatically normalized during generation ### **Text Queuing Workflow** โญ *NEW* 1. Go to **Production Studio** tab 2. Select "Batch Processing" mode 3. Upload multiple text files (chapters, sections, etc.) 4. Choose your voice and settings 5. Start batch processing - files will generate sequentially 6. Monitor progress and download completed audiobooks ### **Professional Standards** - **๐Ÿ“– Audiobook Standard**: -18 dB RMS (recommended for most audiobooks) - **๐ŸŽ™๏ธ Podcast Standard**: -16 dB RMS (for podcast-style content) - **๐Ÿ”‡ Quiet/Comfortable**: -20 dB RMS (for quiet listening environments) - **๐Ÿ”Š Loud/Energetic**: -14 dB RMS (for dynamic, energetic content) - **๐Ÿ“บ Broadcast Standard**: -23 dB RMS (for broadcast television standards) --- ## ๐Ÿ“ Project Structure ``` ๐Ÿ“ฆ Your Audiobook Projects โ”œโ”€โ”€ ๐ŸŽค speakers/ # Voice library and samples โ”œโ”€โ”€ ๐Ÿ“š audiobook_projects/ # Generated audiobooks โ”œโ”€โ”€ ๐Ÿ”ง src/audiobook/ # Core processing modules โ””โ”€โ”€ ๐Ÿ“„ Generated files... # Audio chunks and final outputs ``` --- ## ๐ŸŽฏ Workflow 1. **๐Ÿ“ Prepare Text**: Format your story with proper chapter breaks and strategic line breaks for natural pauses 2. **๐ŸŽค Select Voices**: Choose or clone voices for your characters 3. **๐ŸŽš๏ธ Configure Volume**: Set professional volume levels and normalization 4. **โš™๏ธ Configure Settings**: Adjust quality, speed, and processing options 5. **๐ŸŽง Generate Audio**: Create your audiobook with advanced TTS and automatic pause insertion 6. **๐Ÿงน Clean & Optimize**: Use smart cleanup tools for perfect audio 7. **๐Ÿ“ฆ Export**: Get your finished audiobook ready for distribution ### ๐ŸŽญ **Enhanced Multi-Voice Workflow** 1. **๐Ÿ“ Format Dialogue**: Use `[Character]` tags and strategic line breaks for natural flow 2. **๐Ÿ”‡ Add Return Pauses**: Place line breaks where you want natural speech pauses (0.1s each) 3. **๐ŸŽค Assign Voices**: Map each character to their voice profile 4. **โšก Process with Intelligence**: Watch console output for pause detection feedback 5. **๐ŸŽง Review & Adjust**: Listen to generated audio and refine formatting if needed ### ๐Ÿ“‹ **Batch Processing Workflow** โญ *NEW* 1. **๐Ÿ“š Organize Chapters**: Split your book into individual text files 2. **๐Ÿ“‹ Queue Setup**: Upload all files to the batch processing system 3. **๐ŸŽค Voice Selection**: Choose voice and configure settings once 4. **๐Ÿ”„ Automated Generation**: Let the system process all files sequentially 5. **๐Ÿ“Š Monitor Progress**: Track completion status in real-time 6. **๐Ÿ“ฆ Collect Results**: Download all generated audiobook chapters --- ## ๐Ÿ› ๏ธ Technical Requirements - **Python 3.8+** - **CUDA GPU** (recommended for faster processing) - **8GB+ RAM** (16GB recommended for large projects) - **Modern web browser** for the interface ### ๐Ÿ”ง **CUDA Support** - CUDA compatibility issues have been resolved with updated dependencies - GPU acceleration is now stable for extended generation sessions - Fallback to CPU processing available if CUDA issues occur - **If you encounter CUDA assertion errors**: Use the patched version from the installation instructions above - The fix addresses PyTorch indexing issues that could cause crashes during audio generation --- ## โš ๏ธ Known Issues & Compatibility ### **Multi-Voice Generation** - Short sentences or sections may occasionally cause issues during multi-voice generation - This is a limitation of the underlying TTS models rather than the implementation - **Workaround**: Use longer, more detailed sentences for better stability - Single-voice generation is not affected by this issue ### **Voice Profile Compatibility** - **Existing Voices**: All older voice profiles remain fully functional - **New Features**: To benefit from improved audio quality, consider re-creating voice profiles - **Project Compatibility**: Existing audiobook projects work without modification - **Regeneration**: Individual chunks can be regenerated with improved quality settings ### **Batch Processing Considerations** - Large batch jobs may take significant time depending on text length and hardware - Monitor system resources during extended batch processing sessions - Consider processing very large books in smaller batches for better control --- ## ๐Ÿ“‹ Supported Formats ### Input - **Text**: `.txt`, `.md`, formatted stories and scripts - **Audio Samples**: `.wav`, `.mp3`, `.flac` for voice cloning - **Batch Files**: Multiple text files for queue processing ### Output - **Audio**: High-quality `.wav` files with professional volume levels - **Projects**: Organized folder structure with chapters - **Exports**: Ready-to-use audiobook files - **Batch Results**: Multiple completed audiobooks from queue processing --- ## ๐Ÿ†˜ Support - **Features Guide**: See `AUDIOBOOK_FEATURES.md` for detailed capabilities - **Development Notes**: Check `development/` folder for technical details - **Issues**: Report problems via GitHub issues --- ## ๐Ÿ“„ License This project is licensed under the terms specified in `LICENSE`. --- **๐ŸŽ‰ Ready to create amazing audiobooks with professional volume levels and enhanced audio quality? Run `./launch_audiobook.bat` and start generating!**