Update README.md
Browse files
README.md
CHANGED
|
@@ -67,7 +67,7 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
|
|
| 67 |
|
| 68 |
| BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
|
| 69 |
|:--|:--|:--|:--|:--|:--|
|
| 70 |
-
| **100.45** | **2540.28** | **0.000284** | **1.412** | **0.
|
| 71 |
|
| 72 |
---
|
| 73 |
|
|
@@ -76,11 +76,11 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
|
|
| 76 |
|
| 77 |
| Stable Audio Open 1.0 | StableBeaT |
|
| 78 |
|:--|:--|
|
| 79 |
-
| <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/
|
| 80 |
|
| 81 |
| BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
|
| 82 |
|:--|:--|:--|:--|:--|:--|
|
| 83 |
-
| **
|
| 84 |
|
| 85 |
---
|
| 86 |
|
|
@@ -93,7 +93,7 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
|
|
| 93 |
|
| 94 |
| BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
|
| 95 |
|:--|:--|:--|:--|:--|:--|
|
| 96 |
-
| **82.72** | **1056.42** | **0.000046** | **0.645** | **0.000089** | **0.
|
| 97 |
|
| 98 |
---
|
| 99 |
|
|
@@ -106,7 +106,7 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
|
|
| 106 |
|
| 107 |
| BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
|
| 108 |
|:--|:--|:--|:--|:--|:--|
|
| 109 |
-
| **144.2** | **2458.5** | **0.000356** | **0.738** | **0.00206** | **0.
|
| 110 |
|
| 111 |
---
|
| 112 |
|
|
@@ -115,11 +115,11 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
|
|
| 115 |
|
| 116 |
| Stable Audio Open 1.0 | Stable BeaT |
|
| 117 |
|:--|:--|
|
| 118 |
-
| <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/
|
| 119 |
|
| 120 |
| BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
|
| 121 |
|:--|:--|:--|:--|:--|:--|
|
| 122 |
-
| **
|
| 123 |
|
| 124 |
|
| 125 |
---
|
|
@@ -203,11 +203,14 @@ However, the model tends to underperform on styles that were underrepresented in
|
|
| 203 |
This limitation mainly stems from the uneven tag distribution within the dataset, certain instruments and genres are simply less present.
|
| 204 |
In addition, the tagging tool (CLAP), trained on general-purpose music datasets like LAION-Audio-630K, is not specialized for specific genres such as trap or hip-hop, leading to imprecise tagging of elements like snares, hi-hats, or 808 bass.
|
| 205 |
As a result, these styles are harder for the model to reproduce accurately.
|
|
|
|
| 206 |
|
| 207 |
# Perspectives
|
| 208 |
|
| 209 |
I'd like to fine tune over only 2-3 more epoch of a smaller dataset that represent better underrepresented styles.
|
| 210 |
It'd be interesting to start over with a CLAP specialized on trap/rap genres.
|
|
|
|
|
|
|
| 211 |
I’m open to any feedback or suggestions on my work.
|
| 212 |
|
| 213 |
## Sources
|
|
|
|
| 67 |
|
| 68 |
| BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
|
| 69 |
|:--|:--|:--|:--|:--|:--|
|
| 70 |
+
| **100.45** | **2540.28** | **0.000284** | **1.412** | **0.0000585** | **0.523** |
|
| 71 |
|
| 72 |
---
|
| 73 |
|
|
|
|
| 76 |
|
| 77 |
| Stable Audio Open 1.0 | StableBeaT |
|
| 78 |
|:--|:--|
|
| 79 |
+
| <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/1784661836.wav"></audio> | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/dreamt_14/1784661836.wav"></audio> |
|
| 80 |
|
| 81 |
| BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
|
| 82 |
|:--|:--|:--|:--|:--|:--|
|
| 83 |
+
| **148.02** | **4287.26** | **0.00179** | **2.963** | **0.000195** | **0.552** |
|
| 84 |
|
| 85 |
---
|
| 86 |
|
|
|
|
| 93 |
|
| 94 |
| BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
|
| 95 |
|:--|:--|:--|:--|:--|:--|
|
| 96 |
+
| **82.72** | **1056.42** | **0.000046** | **0.645** | **0.000089** | **0.478*** |
|
| 97 |
|
| 98 |
---
|
| 99 |
|
|
|
|
| 106 |
|
| 107 |
| BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
|
| 108 |
|:--|:--|:--|:--|:--|:--|
|
| 109 |
+
| **144.2** | **2458.5** | **0.000356** | **0.738** | **0.00206** | **0.363** |
|
| 110 |
|
| 111 |
---
|
| 112 |
|
|
|
|
| 115 |
|
| 116 |
| Stable Audio Open 1.0 | Stable BeaT |
|
| 117 |
|:--|:--|
|
| 118 |
+
| <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/1121349264.wav"></audio> | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/dreamt_14/1121349264.wav"></audio> |
|
| 119 |
|
| 120 |
| BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
|
| 121 |
|:--|:--|:--|:--|:--|:--|
|
| 122 |
+
| **130.81** | **1000.87** | **0.000166** | **0.679** | **0.000007288** | **0.250** |
|
| 123 |
|
| 124 |
|
| 125 |
---
|
|
|
|
| 203 |
This limitation mainly stems from the uneven tag distribution within the dataset, certain instruments and genres are simply less present.
|
| 204 |
In addition, the tagging tool (CLAP), trained on general-purpose music datasets like LAION-Audio-630K, is not specialized for specific genres such as trap or hip-hop, leading to imprecise tagging of elements like snares, hi-hats, or 808 bass.
|
| 205 |
As a result, these styles are harder for the model to reproduce accurately.
|
| 206 |
+
I also noticed that the generated melodic elements, like piano or synths, often sound much quieter than the drums, since their frequencies are more subtle.
|
| 207 |
|
| 208 |
# Perspectives
|
| 209 |
|
| 210 |
I'd like to fine tune over only 2-3 more epoch of a smaller dataset that represent better underrepresented styles.
|
| 211 |
It'd be interesting to start over with a CLAP specialized on trap/rap genres.
|
| 212 |
+
Also interested about noise input conditioning such as [**SpecGrad**](https://arxiv.org/pdf/2203.16749).
|
| 213 |
+
|
| 214 |
I’m open to any feedback or suggestions on my work.
|
| 215 |
|
| 216 |
## Sources
|