jujutechnology commited on
Commit
ad6d77a
Β·
verified Β·
1 Parent(s): b86cad2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +373 -368
README.md CHANGED
@@ -1,368 +1,373 @@
1
- # 🎧 Chatterbox Audiobook Generator
2
-
3
- **This is a work in progress. You can consider this a pre-launch repo at the moment, but if you find bugs, please put them in the issues area. Thank you.**
4
- **Transform your text into high-quality audiobooks with advanced TTS models, voice cloning, and professional volume normalization.**
5
-
6
- ## πŸš€ Quick Start
7
-
8
- ### 1. Install Dependencies
9
- ```bash
10
- ./install-audiobook.bat
11
- ```
12
-
13
- ### 2. Launch the Application
14
- ```bash
15
- ./launch_audiobook.bat
16
- ```
17
-
18
- ### 3. CUDA Issue Fix (If Needed)
19
- If you encounter CUDA assertion errors during generation, install the patched version:
20
- ```bash
21
- # Activate your virtual environment first
22
- venv\Scripts\activate.bat
23
-
24
- # Install the CUDA-fixed version
25
- pip install --force-reinstall --no-cache-dir "chatterbox-tts @ git+https://github.com/fakerybakery/better-chatterbox@fix-cuda-issue"
26
- ```
27
-
28
- The web interface will open automatically in your browser at `http://localhost:7860`
29
-
30
- ---
31
-
32
- ## ✨ Features
33
-
34
- ### πŸ“š **Audiobook Creation**
35
- - **Single Voice**: Generate entire audiobooks with one consistent voice
36
- - **Multi-Voice**: Create dynamic audiobooks with multiple characters
37
- - **Custom Voices**: Clone voices from audio samples for personalized narration
38
- - **Professional Volume Normalization**: Ensure consistent audio levels across all voices
39
- - **πŸ“‹ Text Queuing System** ⭐ *NEW*: Upload books in any size chapters and generate continuously
40
- - **πŸ”„ Chunk-Based Processing** ⭐ *NEW*: Improved reliability for longer text generations
41
-
42
- ### 🎡 **Audio Processing**
43
- - **Smart Cleanup**: Remove unwanted silence and audio artifacts
44
- - **Volume Normalization**: Professional-grade volume balancing for all voices
45
- - **Real-time Audio Analysis**: Live volume level monitoring and feedback
46
- - **Preview System**: Test settings before applying to entire projects
47
- - **Batch Processing**: Process multiple projects efficiently
48
- - **Quality Control**: Advanced audio optimization tools
49
- - **🎯 Enhanced Audio Quality** ⭐ *NEW*: Improved P-top and minimum P parameters for better voice generation
50
-
51
- ### 🎭 **Voice Management**
52
- - **Voice Library**: Organize and manage your voice collection
53
- - **Voice Cloning**: Create custom voices from audio samples
54
- - **Volume Settings**: Configure target volume levels for each voice
55
- - **Professional Presets**: Industry-standard volume levels (audiobook, podcast, broadcast)
56
- - **Character Assignment**: Map specific voices to story characters
57
-
58
- ### πŸ“Š **Volume Normalization System** ⭐ *NEW*
59
- - **Professional Standards**: Audiobook (-18 dB), Podcast (-16 dB), Broadcast (-23 dB) presets
60
- - **Consistent Character Voices**: All characters maintain the same volume level
61
- - **Real-time Analysis**: Color-coded volume status with RMS and peak level display
62
- - **Retroactive Normalization**: Apply volume settings to existing voice projects
63
- - **Multi-Voice Support**: Batch normalize all voices in multi-character audiobooks
64
- - **Soft Limiting**: Intelligent audio limiting to prevent distortion
65
-
66
- ### πŸ“– **Text Processing**
67
- - **Chapter Support**: Automatic chapter detection and organization
68
- - **Multi-Voice Parsing**: Parse character dialogue automatically
69
- - **Text Validation**: Ensure proper formatting before generation
70
- - **πŸ“‹ Queue Management** ⭐ *NEW*: Batch process multiple text files sequentially
71
- - **πŸ”‡ Return Pause System** ⭐ *NEW*: Automatic pause insertion based on line breaks for natural speech flow
72
-
73
- ---
74
-
75
- ## 🎭 Custom Audiobook Processing Pipeline ⭐ *NEW*
76
-
77
- Our advanced text processing pipeline transforms your written content into natural-sounding audiobooks with intelligent pause placement and character flow management.
78
-
79
- ### πŸ”‡ **Return Pause System**
80
-
81
- **Automatic pause insertion based on your text formatting** - Every line break (`\n`) in your text automatically adds a 0.1-second pause to the generated audio, creating natural speech rhythms without manual intervention.
82
-
83
- #### **How It Works**
84
- - **Line Break Detection**: System automatically counts all line breaks in your text
85
- - **Pause Calculation**: Each return adds exactly 0.1 seconds of silence
86
- - **Accumulative Pauses**: Multiple consecutive line breaks create longer pauses
87
- - **Universal Support**: Works with single-voice, multi-voice, and batch processing
88
-
89
- #### **Example Text Formatting**
90
- ```
91
- [Narrator] The sun was setting over the hills.
92
-
93
- [Character1] "We need to find shelter soon."
94
-
95
- [Character2] "I see a cave up ahead.
96
- Let's hurry before it gets dark."
97
-
98
-
99
- [Narrator] They rushed toward the cave, hearts pounding.
100
- ```
101
- **Result**: Natural pauses between dialogue, emphasis pauses for dramatic effect, and smooth character transitions.
102
-
103
- ### πŸ“ **Text Formatting Best Practices**
104
-
105
- #### **🎭 Multi-Voice Dialogue Structure**
106
- ```
107
- [Character Name] Dialogue content here.
108
-
109
- [Another Character] Response content here.
110
- Multiple lines can be used for the same character.
111
-
112
- [Narrator] Descriptive text and scene setting.
113
- ```
114
-
115
- #### **πŸŽͺ Natural Flow Techniques**
116
- - **Paragraph Breaks**: Use double line breaks for scene transitions
117
- - **Emphasis Pauses**: Add extra returns before important revelations
118
- - **Character Separation**: Single returns between different speakers
119
- - **Breathing Room**: Natural pauses for complex concepts or emotional moments
120
-
121
- #### **πŸ“– Single Voice Formatting**
122
- ```
123
- Chapter content flows naturally here.
124
-
125
- New paragraphs create natural pauses.
126
-
127
-
128
- Extended pauses can emphasize dramatic moments.
129
-
130
- Regular text continues with normal pacing.
131
- ```
132
-
133
- ### πŸ”„ **Processing Pipeline Features**
134
-
135
- #### **🧠 Intelligent Text Analysis**
136
- - **Line Break Preservation**: Maintains your formatting intentions throughout processing
137
- - **Character Assignment**: Automatically maps voice tags to selected voice profiles
138
- - **Chunk Optimization**: Breaks long texts into optimal segments while preserving pause timing
139
- - **Error Recovery**: Validates text and provides helpful formatting suggestions
140
-
141
- #### **⚑ Real-Time Processing**
142
- - **Live Feedback**: Console output shows exactly how many pauses are being added
143
- - **Debug Information**: Detailed logging of pause detection and application
144
- - **Progress Tracking**: Monitor pause processing alongside audio generation
145
- - **Quality Assurance**: Automatic validation of pause placement
146
-
147
- #### **🎚️ Professional Output**
148
- - **Seamless Integration**: Pauses blend naturally with generated speech
149
- - **Volume Consistency**: Silence segments match the audio output specifications
150
- - **Format Compatibility**: Works with all supported audio formats and quality settings
151
- - **Project Preservation**: Pause information saved in project metadata for regeneration
152
-
153
- ### πŸ’‘ **Pro Tips for Better Audiobooks**
154
-
155
- #### **🎯 Dialogue Formatting**
156
- - **Character Consistency**: Always use the same character name format `[Name]`
157
- - **Natural Breaks**: Place returns where a human reader would naturally pause
158
- - **Scene Transitions**: Use multiple returns (2-3) for major scene changes
159
- - **Emotional Beats**: Add single returns before/after emotional dialogue
160
-
161
- #### **πŸ“š Chapter Structure**
162
- ```
163
- Chapter 1: The Beginning
164
-
165
- Opening paragraph with scene setting.
166
-
167
- "Character dialogue with natural flow."
168
-
169
- Descriptive narrative continues.
170
-
171
-
172
- Major scene transition with extended pause.
173
-
174
- New section begins here.
175
- ```
176
-
177
- #### **πŸŽͺ Advanced Techniques**
178
- - **Cliffhangers**: Use extended pauses before revealing crucial information
179
- - **Action Sequences**: Shorter, punchy sentences with minimal pauses for intensity
180
- - **Contemplative Moments**: Longer pauses for reflection and character development
181
- - **Comedic Timing**: Strategic pauses before punchlines or comedic reveals
182
-
183
- ### πŸ” **Debug Output Examples**
184
-
185
- When generating your audiobook, watch for these helpful console messages:
186
- ```
187
- πŸ”‡ Detected 15 line breaks β†’ 1.5s total pause time
188
- πŸ”‡ Line breaks detected in [Character1]: +0.3s pause (from 3 returns)
189
- πŸ”‡ Chunk 2 (Narrator): Added 0.2s pause after speech
190
- ```
191
-
192
- This real-time feedback helps you understand exactly how your formatting translates to audio timing.
193
-
194
- ---
195
-
196
- ## πŸ†• Recent Improvements
197
-
198
- ### 🎯 **Audio Quality Enhancements**
199
- We've significantly improved audio generation quality by optimizing the underlying TTS parameters:
200
-
201
- - **Enhanced P-top and Minimum P Settings**: Fine-tuned probability parameters for more natural speech patterns
202
- - **Reduced Audio Artifacts**: Better handling of pronunciation and intonation
203
- - **Improved Voice Consistency**: More stable voice characteristics across long generations
204
- - **Better Pronunciation**: Enhanced handling of complex words and names
205
-
206
- **πŸ“ Note for Existing Users**:
207
- - Older voice profiles will continue to work as before
208
- - To take advantage of the new audio quality improvements, consider re-creating voice profiles
209
- - Existing projects remain fully compatible
210
-
211
- ### πŸ“‹ **Text Queuing System**
212
- Perfect for processing large books or multiple chapters:
213
-
214
- - **Batch Upload**: Upload multiple text files of any size
215
- - **Sequential Processing**: Automatically processes files one after another
216
- - **Progress Tracking**: Monitor generation progress across all queued items
217
- - **Flexible Chapter Sizes**: No restrictions on individual file length
218
- - **Unattended Generation**: Set up large projects and let them run automatically
219
-
220
- ### πŸ”„ **Chunk-Based TTS System**
221
- Enhanced the core text-to-speech engine for better reliability:
222
-
223
- - **Background Chunking**: Automatically splits long texts into optimal chunks
224
- - **Memory Management**: Better handling of large text inputs
225
- - **Error Recovery**: Improved resilience during long generation sessions
226
- - **Consistent Quality**: Maintains voice quality across chunk boundaries
227
- - **Progress Feedback**: Real-time updates on generation progress
228
-
229
- ---
230
-
231
- ## 🎚️ Volume Normalization Guide
232
-
233
- ### **Individual Voice Setup**
234
- 1. Go to **Voice Library** tab
235
- 2. Upload your voice sample and configure settings
236
- 3. Set target volume level (default: -18 dB for audiobooks)
237
- 4. Choose from professional presets or use custom levels
238
- 5. Save voice profile with volume settings
239
-
240
- ### **Multi-Voice Projects**
241
- 1. Navigate to **Multi-Voice Audiobook Creation** tab
242
- 2. Enable volume normalization for all voices
243
- 3. Set target level for consistent character voices
244
- 4. All characters will be automatically normalized during generation
245
-
246
- ### **Text Queuing Workflow** ⭐ *NEW*
247
- 1. Go to **Production Studio** tab
248
- 2. Select "Batch Processing" mode
249
- 3. Upload multiple text files (chapters, sections, etc.)
250
- 4. Choose your voice and settings
251
- 5. Start batch processing - files will generate sequentially
252
- 6. Monitor progress and download completed audiobooks
253
-
254
- ### **Professional Standards**
255
- - **πŸ“– Audiobook Standard**: -18 dB RMS (recommended for most audiobooks)
256
- - **πŸŽ™οΈ Podcast Standard**: -16 dB RMS (for podcast-style content)
257
- - **πŸ”‡ Quiet/Comfortable**: -20 dB RMS (for quiet listening environments)
258
- - **πŸ”Š Loud/Energetic**: -14 dB RMS (for dynamic, energetic content)
259
- - **πŸ“Ί Broadcast Standard**: -23 dB RMS (for broadcast television standards)
260
-
261
- ---
262
-
263
- ## πŸ“ Project Structure
264
-
265
- ```
266
- πŸ“¦ Your Audiobook Projects
267
- β”œβ”€β”€ 🎀 speakers/ # Voice library and samples
268
- β”œβ”€β”€ πŸ“š audiobook_projects/ # Generated audiobooks
269
- β”œβ”€β”€ πŸ”§ src/audiobook/ # Core processing modules
270
- └── πŸ“„ Generated files... # Audio chunks and final outputs
271
- ```
272
-
273
- ---
274
-
275
- ## 🎯 Workflow
276
-
277
- 1. **πŸ“ Prepare Text**: Format your story with proper chapter breaks and strategic line breaks for natural pauses
278
- 2. **🎀 Select Voices**: Choose or clone voices for your characters
279
- 3. **🎚️ Configure Volume**: Set professional volume levels and normalization
280
- 4. **βš™οΈ Configure Settings**: Adjust quality, speed, and processing options
281
- 5. **🎧 Generate Audio**: Create your audiobook with advanced TTS and automatic pause insertion
282
- 6. **🧹 Clean & Optimize**: Use smart cleanup tools for perfect audio
283
- 7. **πŸ“¦ Export**: Get your finished audiobook ready for distribution
284
-
285
- ### 🎭 **Enhanced Multi-Voice Workflow**
286
- 1. **πŸ“ Format Dialogue**: Use `[Character]` tags and strategic line breaks for natural flow
287
- 2. **πŸ”‡ Add Return Pauses**: Place line breaks where you want natural speech pauses (0.1s each)
288
- 3. **🎀 Assign Voices**: Map each character to their voice profile
289
- 4. **⚑ Process with Intelligence**: Watch console output for pause detection feedback
290
- 5. **🎧 Review & Adjust**: Listen to generated audio and refine formatting if needed
291
-
292
- ### πŸ“‹ **Batch Processing Workflow** ⭐ *NEW*
293
- 1. **πŸ“š Organize Chapters**: Split your book into individual text files
294
- 2. **πŸ“‹ Queue Setup**: Upload all files to the batch processing system
295
- 3. **🎀 Voice Selection**: Choose voice and configure settings once
296
- 4. **πŸ”„ Automated Generation**: Let the system process all files sequentially
297
- 5. **πŸ“Š Monitor Progress**: Track completion status in real-time
298
- 6. **πŸ“¦ Collect Results**: Download all generated audiobook chapters
299
-
300
- ---
301
-
302
- ## πŸ› οΈ Technical Requirements
303
-
304
- - **Python 3.8+**
305
- - **CUDA GPU** (recommended for faster processing)
306
- - **8GB+ RAM** (16GB recommended for large projects)
307
- - **Modern web browser** for the interface
308
-
309
- ### πŸ”§ **CUDA Support**
310
- - CUDA compatibility issues have been resolved with updated dependencies
311
- - GPU acceleration is now stable for extended generation sessions
312
- - Fallback to CPU processing available if CUDA issues occur
313
- - **If you encounter CUDA assertion errors**: Use the patched version from the installation instructions above
314
- - The fix addresses PyTorch indexing issues that could cause crashes during audio generation
315
-
316
- ---
317
-
318
- ## ⚠️ Known Issues & Compatibility
319
-
320
- ### **Multi-Voice Generation**
321
- - Short sentences or sections may occasionally cause issues during multi-voice generation
322
- - This is a limitation of the underlying TTS models rather than the implementation
323
- - **Workaround**: Use longer, more detailed sentences for better stability
324
- - Single-voice generation is not affected by this issue
325
-
326
- ### **Voice Profile Compatibility**
327
- - **Existing Voices**: All older voice profiles remain fully functional
328
- - **New Features**: To benefit from improved audio quality, consider re-creating voice profiles
329
- - **Project Compatibility**: Existing audiobook projects work without modification
330
- - **Regeneration**: Individual chunks can be regenerated with improved quality settings
331
-
332
- ### **Batch Processing Considerations**
333
- - Large batch jobs may take significant time depending on text length and hardware
334
- - Monitor system resources during extended batch processing sessions
335
- - Consider processing very large books in smaller batches for better control
336
-
337
- ---
338
-
339
- ## πŸ“‹ Supported Formats
340
-
341
- ### Input
342
- - **Text**: `.txt`, `.md`, formatted stories and scripts
343
- - **Audio Samples**: `.wav`, `.mp3`, `.flac` for voice cloning
344
- - **Batch Files**: Multiple text files for queue processing
345
-
346
- ### Output
347
- - **Audio**: High-quality `.wav` files with professional volume levels
348
- - **Projects**: Organized folder structure with chapters
349
- - **Exports**: Ready-to-use audiobook files
350
- - **Batch Results**: Multiple completed audiobooks from queue processing
351
-
352
- ---
353
-
354
- ## πŸ†˜ Support
355
-
356
- - **Features Guide**: See `AUDIOBOOK_FEATURES.md` for detailed capabilities
357
- - **Development Notes**: Check `development/` folder for technical details
358
- - **Issues**: Report problems via GitHub issues
359
-
360
- ---
361
-
362
- ## πŸ“„ License
363
-
364
- This project is licensed under the terms specified in `LICENSE`.
365
-
366
- ---
367
-
368
- **πŸŽ‰ Ready to create amazing audiobooks with professional volume levels and enhanced audio quality? Run `./launch_audiobook.bat` and start generating!**
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ title: ebookChatterBox
4
+ sdk: gradio
5
+ ---
6
+ # 🎧 Chatterbox Audiobook Generator
7
+
8
+ **This is a work in progress. You can consider this a pre-launch repo at the moment, but if you find bugs, please put them in the issues area. Thank you.**
9
+ **Transform your text into high-quality audiobooks with advanced TTS models, voice cloning, and professional volume normalization.**
10
+
11
+ ## πŸš€ Quick Start
12
+
13
+ ### 1. Install Dependencies
14
+ ```bash
15
+ ./install-audiobook.bat
16
+ ```
17
+
18
+ ### 2. Launch the Application
19
+ ```bash
20
+ ./launch_audiobook.bat
21
+ ```
22
+
23
+ ### 3. CUDA Issue Fix (If Needed)
24
+ If you encounter CUDA assertion errors during generation, install the patched version:
25
+ ```bash
26
+ # Activate your virtual environment first
27
+ venv\Scripts\activate.bat
28
+
29
+ # Install the CUDA-fixed version
30
+ pip install --force-reinstall --no-cache-dir "chatterbox-tts @ git+https://github.com/fakerybakery/better-chatterbox@fix-cuda-issue"
31
+ ```
32
+
33
+ The web interface will open automatically in your browser at `http://localhost:7860`
34
+
35
+ ---
36
+
37
+ ## ✨ Features
38
+
39
+ ### πŸ“š **Audiobook Creation**
40
+ - **Single Voice**: Generate entire audiobooks with one consistent voice
41
+ - **Multi-Voice**: Create dynamic audiobooks with multiple characters
42
+ - **Custom Voices**: Clone voices from audio samples for personalized narration
43
+ - **Professional Volume Normalization**: Ensure consistent audio levels across all voices
44
+ - **πŸ“‹ Text Queuing System** ⭐ *NEW*: Upload books in any size chapters and generate continuously
45
+ - **πŸ”„ Chunk-Based Processing** ⭐ *NEW*: Improved reliability for longer text generations
46
+
47
+ ### 🎡 **Audio Processing**
48
+ - **Smart Cleanup**: Remove unwanted silence and audio artifacts
49
+ - **Volume Normalization**: Professional-grade volume balancing for all voices
50
+ - **Real-time Audio Analysis**: Live volume level monitoring and feedback
51
+ - **Preview System**: Test settings before applying to entire projects
52
+ - **Batch Processing**: Process multiple projects efficiently
53
+ - **Quality Control**: Advanced audio optimization tools
54
+ - **🎯 Enhanced Audio Quality** ⭐ *NEW*: Improved P-top and minimum P parameters for better voice generation
55
+
56
+ ### 🎭 **Voice Management**
57
+ - **Voice Library**: Organize and manage your voice collection
58
+ - **Voice Cloning**: Create custom voices from audio samples
59
+ - **Volume Settings**: Configure target volume levels for each voice
60
+ - **Professional Presets**: Industry-standard volume levels (audiobook, podcast, broadcast)
61
+ - **Character Assignment**: Map specific voices to story characters
62
+
63
+ ### πŸ“Š **Volume Normalization System** ⭐ *NEW*
64
+ - **Professional Standards**: Audiobook (-18 dB), Podcast (-16 dB), Broadcast (-23 dB) presets
65
+ - **Consistent Character Voices**: All characters maintain the same volume level
66
+ - **Real-time Analysis**: Color-coded volume status with RMS and peak level display
67
+ - **Retroactive Normalization**: Apply volume settings to existing voice projects
68
+ - **Multi-Voice Support**: Batch normalize all voices in multi-character audiobooks
69
+ - **Soft Limiting**: Intelligent audio limiting to prevent distortion
70
+
71
+ ### πŸ“– **Text Processing**
72
+ - **Chapter Support**: Automatic chapter detection and organization
73
+ - **Multi-Voice Parsing**: Parse character dialogue automatically
74
+ - **Text Validation**: Ensure proper formatting before generation
75
+ - **πŸ“‹ Queue Management** ⭐ *NEW*: Batch process multiple text files sequentially
76
+ - **πŸ”‡ Return Pause System** ⭐ *NEW*: Automatic pause insertion based on line breaks for natural speech flow
77
+
78
+ ---
79
+
80
+ ## 🎭 Custom Audiobook Processing Pipeline ⭐ *NEW*
81
+
82
+ Our advanced text processing pipeline transforms your written content into natural-sounding audiobooks with intelligent pause placement and character flow management.
83
+
84
+ ### πŸ”‡ **Return Pause System**
85
+
86
+ **Automatic pause insertion based on your text formatting** - Every line break (`\n`) in your text automatically adds a 0.1-second pause to the generated audio, creating natural speech rhythms without manual intervention.
87
+
88
+ #### **How It Works**
89
+ - **Line Break Detection**: System automatically counts all line breaks in your text
90
+ - **Pause Calculation**: Each return adds exactly 0.1 seconds of silence
91
+ - **Accumulative Pauses**: Multiple consecutive line breaks create longer pauses
92
+ - **Universal Support**: Works with single-voice, multi-voice, and batch processing
93
+
94
+ #### **Example Text Formatting**
95
+ ```
96
+ [Narrator] The sun was setting over the hills.
97
+
98
+ [Character1] "We need to find shelter soon."
99
+
100
+ [Character2] "I see a cave up ahead.
101
+ Let's hurry before it gets dark."
102
+
103
+
104
+ [Narrator] They rushed toward the cave, hearts pounding.
105
+ ```
106
+ **Result**: Natural pauses between dialogue, emphasis pauses for dramatic effect, and smooth character transitions.
107
+
108
+ ### πŸ“ **Text Formatting Best Practices**
109
+
110
+ #### **🎭 Multi-Voice Dialogue Structure**
111
+ ```
112
+ [Character Name] Dialogue content here.
113
+
114
+ [Another Character] Response content here.
115
+ Multiple lines can be used for the same character.
116
+
117
+ [Narrator] Descriptive text and scene setting.
118
+ ```
119
+
120
+ #### **πŸŽͺ Natural Flow Techniques**
121
+ - **Paragraph Breaks**: Use double line breaks for scene transitions
122
+ - **Emphasis Pauses**: Add extra returns before important revelations
123
+ - **Character Separation**: Single returns between different speakers
124
+ - **Breathing Room**: Natural pauses for complex concepts or emotional moments
125
+
126
+ #### **πŸ“– Single Voice Formatting**
127
+ ```
128
+ Chapter content flows naturally here.
129
+
130
+ New paragraphs create natural pauses.
131
+
132
+
133
+ Extended pauses can emphasize dramatic moments.
134
+
135
+ Regular text continues with normal pacing.
136
+ ```
137
+
138
+ ### πŸ”„ **Processing Pipeline Features**
139
+
140
+ #### **🧠 Intelligent Text Analysis**
141
+ - **Line Break Preservation**: Maintains your formatting intentions throughout processing
142
+ - **Character Assignment**: Automatically maps voice tags to selected voice profiles
143
+ - **Chunk Optimization**: Breaks long texts into optimal segments while preserving pause timing
144
+ - **Error Recovery**: Validates text and provides helpful formatting suggestions
145
+
146
+ #### **⚑ Real-Time Processing**
147
+ - **Live Feedback**: Console output shows exactly how many pauses are being added
148
+ - **Debug Information**: Detailed logging of pause detection and application
149
+ - **Progress Tracking**: Monitor pause processing alongside audio generation
150
+ - **Quality Assurance**: Automatic validation of pause placement
151
+
152
+ #### **🎚️ Professional Output**
153
+ - **Seamless Integration**: Pauses blend naturally with generated speech
154
+ - **Volume Consistency**: Silence segments match the audio output specifications
155
+ - **Format Compatibility**: Works with all supported audio formats and quality settings
156
+ - **Project Preservation**: Pause information saved in project metadata for regeneration
157
+
158
+ ### πŸ’‘ **Pro Tips for Better Audiobooks**
159
+
160
+ #### **🎯 Dialogue Formatting**
161
+ - **Character Consistency**: Always use the same character name format `[Name]`
162
+ - **Natural Breaks**: Place returns where a human reader would naturally pause
163
+ - **Scene Transitions**: Use multiple returns (2-3) for major scene changes
164
+ - **Emotional Beats**: Add single returns before/after emotional dialogue
165
+
166
+ #### **πŸ“š Chapter Structure**
167
+ ```
168
+ Chapter 1: The Beginning
169
+
170
+ Opening paragraph with scene setting.
171
+
172
+ "Character dialogue with natural flow."
173
+
174
+ Descriptive narrative continues.
175
+
176
+
177
+ Major scene transition with extended pause.
178
+
179
+ New section begins here.
180
+ ```
181
+
182
+ #### **πŸŽͺ Advanced Techniques**
183
+ - **Cliffhangers**: Use extended pauses before revealing crucial information
184
+ - **Action Sequences**: Shorter, punchy sentences with minimal pauses for intensity
185
+ - **Contemplative Moments**: Longer pauses for reflection and character development
186
+ - **Comedic Timing**: Strategic pauses before punchlines or comedic reveals
187
+
188
+ ### πŸ” **Debug Output Examples**
189
+
190
+ When generating your audiobook, watch for these helpful console messages:
191
+ ```
192
+ πŸ”‡ Detected 15 line breaks β†’ 1.5s total pause time
193
+ πŸ”‡ Line breaks detected in [Character1]: +0.3s pause (from 3 returns)
194
+ πŸ”‡ Chunk 2 (Narrator): Added 0.2s pause after speech
195
+ ```
196
+
197
+ This real-time feedback helps you understand exactly how your formatting translates to audio timing.
198
+
199
+ ---
200
+
201
+ ## πŸ†• Recent Improvements
202
+
203
+ ### 🎯 **Audio Quality Enhancements**
204
+ We've significantly improved audio generation quality by optimizing the underlying TTS parameters:
205
+
206
+ - **Enhanced P-top and Minimum P Settings**: Fine-tuned probability parameters for more natural speech patterns
207
+ - **Reduced Audio Artifacts**: Better handling of pronunciation and intonation
208
+ - **Improved Voice Consistency**: More stable voice characteristics across long generations
209
+ - **Better Pronunciation**: Enhanced handling of complex words and names
210
+
211
+ **πŸ“ Note for Existing Users**:
212
+ - Older voice profiles will continue to work as before
213
+ - To take advantage of the new audio quality improvements, consider re-creating voice profiles
214
+ - Existing projects remain fully compatible
215
+
216
+ ### πŸ“‹ **Text Queuing System**
217
+ Perfect for processing large books or multiple chapters:
218
+
219
+ - **Batch Upload**: Upload multiple text files of any size
220
+ - **Sequential Processing**: Automatically processes files one after another
221
+ - **Progress Tracking**: Monitor generation progress across all queued items
222
+ - **Flexible Chapter Sizes**: No restrictions on individual file length
223
+ - **Unattended Generation**: Set up large projects and let them run automatically
224
+
225
+ ### πŸ”„ **Chunk-Based TTS System**
226
+ Enhanced the core text-to-speech engine for better reliability:
227
+
228
+ - **Background Chunking**: Automatically splits long texts into optimal chunks
229
+ - **Memory Management**: Better handling of large text inputs
230
+ - **Error Recovery**: Improved resilience during long generation sessions
231
+ - **Consistent Quality**: Maintains voice quality across chunk boundaries
232
+ - **Progress Feedback**: Real-time updates on generation progress
233
+
234
+ ---
235
+
236
+ ## 🎚️ Volume Normalization Guide
237
+
238
+ ### **Individual Voice Setup**
239
+ 1. Go to **Voice Library** tab
240
+ 2. Upload your voice sample and configure settings
241
+ 3. Set target volume level (default: -18 dB for audiobooks)
242
+ 4. Choose from professional presets or use custom levels
243
+ 5. Save voice profile with volume settings
244
+
245
+ ### **Multi-Voice Projects**
246
+ 1. Navigate to **Multi-Voice Audiobook Creation** tab
247
+ 2. Enable volume normalization for all voices
248
+ 3. Set target level for consistent character voices
249
+ 4. All characters will be automatically normalized during generation
250
+
251
+ ### **Text Queuing Workflow** ⭐ *NEW*
252
+ 1. Go to **Production Studio** tab
253
+ 2. Select "Batch Processing" mode
254
+ 3. Upload multiple text files (chapters, sections, etc.)
255
+ 4. Choose your voice and settings
256
+ 5. Start batch processing - files will generate sequentially
257
+ 6. Monitor progress and download completed audiobooks
258
+
259
+ ### **Professional Standards**
260
+ - **πŸ“– Audiobook Standard**: -18 dB RMS (recommended for most audiobooks)
261
+ - **πŸŽ™οΈ Podcast Standard**: -16 dB RMS (for podcast-style content)
262
+ - **πŸ”‡ Quiet/Comfortable**: -20 dB RMS (for quiet listening environments)
263
+ - **πŸ”Š Loud/Energetic**: -14 dB RMS (for dynamic, energetic content)
264
+ - **πŸ“Ί Broadcast Standard**: -23 dB RMS (for broadcast television standards)
265
+
266
+ ---
267
+
268
+ ## πŸ“ Project Structure
269
+
270
+ ```
271
+ πŸ“¦ Your Audiobook Projects
272
+ β”œβ”€β”€ 🎀 speakers/ # Voice library and samples
273
+ β”œβ”€β”€ πŸ“š audiobook_projects/ # Generated audiobooks
274
+ β”œβ”€β”€ πŸ”§ src/audiobook/ # Core processing modules
275
+ └── πŸ“„ Generated files... # Audio chunks and final outputs
276
+ ```
277
+
278
+ ---
279
+
280
+ ## 🎯 Workflow
281
+
282
+ 1. **πŸ“ Prepare Text**: Format your story with proper chapter breaks and strategic line breaks for natural pauses
283
+ 2. **🎀 Select Voices**: Choose or clone voices for your characters
284
+ 3. **🎚️ Configure Volume**: Set professional volume levels and normalization
285
+ 4. **βš™οΈ Configure Settings**: Adjust quality, speed, and processing options
286
+ 5. **🎧 Generate Audio**: Create your audiobook with advanced TTS and automatic pause insertion
287
+ 6. **🧹 Clean & Optimize**: Use smart cleanup tools for perfect audio
288
+ 7. **πŸ“¦ Export**: Get your finished audiobook ready for distribution
289
+
290
+ ### 🎭 **Enhanced Multi-Voice Workflow**
291
+ 1. **πŸ“ Format Dialogue**: Use `[Character]` tags and strategic line breaks for natural flow
292
+ 2. **πŸ”‡ Add Return Pauses**: Place line breaks where you want natural speech pauses (0.1s each)
293
+ 3. **🎀 Assign Voices**: Map each character to their voice profile
294
+ 4. **⚑ Process with Intelligence**: Watch console output for pause detection feedback
295
+ 5. **🎧 Review & Adjust**: Listen to generated audio and refine formatting if needed
296
+
297
+ ### πŸ“‹ **Batch Processing Workflow** ⭐ *NEW*
298
+ 1. **πŸ“š Organize Chapters**: Split your book into individual text files
299
+ 2. **πŸ“‹ Queue Setup**: Upload all files to the batch processing system
300
+ 3. **🎀 Voice Selection**: Choose voice and configure settings once
301
+ 4. **πŸ”„ Automated Generation**: Let the system process all files sequentially
302
+ 5. **πŸ“Š Monitor Progress**: Track completion status in real-time
303
+ 6. **πŸ“¦ Collect Results**: Download all generated audiobook chapters
304
+
305
+ ---
306
+
307
+ ## πŸ› οΈ Technical Requirements
308
+
309
+ - **Python 3.8+**
310
+ - **CUDA GPU** (recommended for faster processing)
311
+ - **8GB+ RAM** (16GB recommended for large projects)
312
+ - **Modern web browser** for the interface
313
+
314
+ ### πŸ”§ **CUDA Support**
315
+ - CUDA compatibility issues have been resolved with updated dependencies
316
+ - GPU acceleration is now stable for extended generation sessions
317
+ - Fallback to CPU processing available if CUDA issues occur
318
+ - **If you encounter CUDA assertion errors**: Use the patched version from the installation instructions above
319
+ - The fix addresses PyTorch indexing issues that could cause crashes during audio generation
320
+
321
+ ---
322
+
323
+ ## ⚠️ Known Issues & Compatibility
324
+
325
+ ### **Multi-Voice Generation**
326
+ - Short sentences or sections may occasionally cause issues during multi-voice generation
327
+ - This is a limitation of the underlying TTS models rather than the implementation
328
+ - **Workaround**: Use longer, more detailed sentences for better stability
329
+ - Single-voice generation is not affected by this issue
330
+
331
+ ### **Voice Profile Compatibility**
332
+ - **Existing Voices**: All older voice profiles remain fully functional
333
+ - **New Features**: To benefit from improved audio quality, consider re-creating voice profiles
334
+ - **Project Compatibility**: Existing audiobook projects work without modification
335
+ - **Regeneration**: Individual chunks can be regenerated with improved quality settings
336
+
337
+ ### **Batch Processing Considerations**
338
+ - Large batch jobs may take significant time depending on text length and hardware
339
+ - Monitor system resources during extended batch processing sessions
340
+ - Consider processing very large books in smaller batches for better control
341
+
342
+ ---
343
+
344
+ ## πŸ“‹ Supported Formats
345
+
346
+ ### Input
347
+ - **Text**: `.txt`, `.md`, formatted stories and scripts
348
+ - **Audio Samples**: `.wav`, `.mp3`, `.flac` for voice cloning
349
+ - **Batch Files**: Multiple text files for queue processing
350
+
351
+ ### Output
352
+ - **Audio**: High-quality `.wav` files with professional volume levels
353
+ - **Projects**: Organized folder structure with chapters
354
+ - **Exports**: Ready-to-use audiobook files
355
+ - **Batch Results**: Multiple completed audiobooks from queue processing
356
+
357
+ ---
358
+
359
+ ## πŸ†˜ Support
360
+
361
+ - **Features Guide**: See `AUDIOBOOK_FEATURES.md` for detailed capabilities
362
+ - **Development Notes**: Check `development/` folder for technical details
363
+ - **Issues**: Report problems via GitHub issues
364
+
365
+ ---
366
+
367
+ ## πŸ“„ License
368
+
369
+ This project is licensed under the terms specified in `LICENSE`.
370
+
371
+ ---
372
+
373
+ **πŸŽ‰ Ready to create amazing audiobooks with professional volume levels and enhanced audio quality? Run `./launch_audiobook.bat` and start generating!**