gab-gdp
/

StableBeaT

@@ -67,7 +67,7 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
 | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
 |:--|:--|:--|:--|:--|:--|
-| **100.45** | **2540.28** | **0.000284** | **1.412** | **0.000059** | **0.474** |
 ---
@@ -76,11 +76,11 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
 | Stable Audio Open 1.0 | StableBeaT |
 |:--|:--|
-| <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/1984661836.wav"></audio> | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/dreamt_14/1784661836.wav"></audio> |
 | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
 |:--|:--|:--|:--|:--|:--|
-| **125.00** | **685.30** | **0.000013** | **0.543** | **0.000432** | **0.422** |
 ---
@@ -93,7 +93,7 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
 | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
 |:--|:--|:--|:--|:--|:--|
-| **82.72** | **1056.42** | **0.000046** | **0.645** | **0.000089** | **0.468** |
 ---
@@ -106,7 +106,7 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
 | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
 |:--|:--|:--|:--|:--|:--|
-| **144.2** | **2458.5** | **0.000356** | **0.738** | **0.00206** | **0.357** |
 ---
@@ -115,11 +115,11 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
 | Stable Audio Open 1.0 | Stable BeaT |
 |:--|:--|
-| <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/2321349264.wav"></audio> | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/dreamt_14/1121349264.wav"></audio> |
 | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
 |:--|:--|:--|:--|:--|:--|
-| **106.13** | **2920.15** | **0.000258** | **0.808** | **0.000143** | **0.535** |
 ---
@@ -203,11 +203,14 @@ However, the model tends to underperform on styles that were underrepresented in
 This limitation mainly stems from the uneven tag distribution within the dataset, certain instruments and genres are simply less present.
 In addition, the tagging tool (CLAP), trained on general-purpose music datasets like LAION-Audio-630K, is not specialized for specific genres such as trap or hip-hop, leading to imprecise tagging of elements like snares, hi-hats, or 808 bass.
 As a result, these styles are harder for the model to reproduce accurately.
 # Perspectives
 I'd like to fine tune over only 2-3 more epoch of a smaller dataset that represent better underrepresented styles.
 It'd be interesting to start over with a CLAP specialized on trap/rap genres.
 I’m open to any feedback or suggestions on my work.
 ## Sources

 | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
 |:--|:--|:--|:--|:--|:--|
+| **100.45** | **2540.28** | **0.000284** | **1.412** | **0.0000585** | **0.523** |
 ---
 | Stable Audio Open 1.0 | StableBeaT |
 |:--|:--|
+| <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/1784661836.wav"></audio> | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/dreamt_14/1784661836.wav"></audio> |
 | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
 |:--|:--|:--|:--|:--|:--|
+| **148.02** | **4287.26** | **0.00179** | **2.963** | **0.000195** | **0.552** |
 ---
 | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
 |:--|:--|:--|:--|:--|:--|
+| **82.72** | **1056.42** | **0.000046** | **0.645** | **0.000089** | **0.478*** |
 ---
 | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
 |:--|:--|:--|:--|:--|:--|
+| **144.2** | **2458.5** | **0.000356** | **0.738** | **0.00206** | **0.363** |
 ---
 | Stable Audio Open 1.0 | Stable BeaT |
 |:--|:--|
+| <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/1121349264.wav"></audio> | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/dreamt_14/1121349264.wav"></audio> |
 | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
 |:--|:--|:--|:--|:--|:--|
+| **130.81** | **1000.87** | **0.000166** | **0.679** | **0.000007288** | **0.250** |
 ---
 This limitation mainly stems from the uneven tag distribution within the dataset, certain instruments and genres are simply less present.
 In addition, the tagging tool (CLAP), trained on general-purpose music datasets like LAION-Audio-630K, is not specialized for specific genres such as trap or hip-hop, leading to imprecise tagging of elements like snares, hi-hats, or 808 bass.
 As a result, these styles are harder for the model to reproduce accurately.
+I also noticed that the generated melodic elements, like piano or synths, often sound much quieter than the drums, since their frequencies are more subtle.
 # Perspectives
 I'd like to fine tune over only 2-3 more epoch of a smaller dataset that represent better underrepresented styles.
 It'd be interesting to start over with a CLAP specialized on trap/rap genres.
+Also interested about noise input conditioning such as [**SpecGrad**](https://arxiv.org/pdf/2203.16749).
 I’m open to any feedback or suggestions on my work.
 ## Sources