gab-gdp commited on
Commit
e565dd3
·
verified ·
1 Parent(s): 1172b77

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -67,7 +67,7 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
67
 
68
  | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
69
  |:--|:--|:--|:--|:--|:--|
70
- | **100.45** | **2540.28** | **0.000284** | **1.412** | **0.000059** | **0.474** |
71
 
72
  ---
73
 
@@ -76,11 +76,11 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
76
 
77
  | Stable Audio Open 1.0 | StableBeaT |
78
  |:--|:--|
79
- | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/1984661836.wav"></audio> | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/dreamt_14/1784661836.wav"></audio> |
80
 
81
  | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
82
  |:--|:--|:--|:--|:--|:--|
83
- | **125.00** | **685.30** | **0.000013** | **0.543** | **0.000432** | **0.422** |
84
 
85
  ---
86
 
@@ -93,7 +93,7 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
93
 
94
  | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
95
  |:--|:--|:--|:--|:--|:--|
96
- | **82.72** | **1056.42** | **0.000046** | **0.645** | **0.000089** | **0.468** |
97
 
98
  ---
99
 
@@ -106,7 +106,7 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
106
 
107
  | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
108
  |:--|:--|:--|:--|:--|:--|
109
- | **144.2** | **2458.5** | **0.000356** | **0.738** | **0.00206** | **0.357** |
110
 
111
  ---
112
 
@@ -115,11 +115,11 @@ All the following results have been generated with 200 steps, CFG scale of 7, se
115
 
116
  | Stable Audio Open 1.0 | Stable BeaT |
117
  |:--|:--|
118
- | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/2321349264.wav"></audio> | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/dreamt_14/1121349264.wav"></audio> |
119
 
120
  | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
121
  |:--|:--|:--|:--|:--|:--|
122
- | **106.13** | **2920.15** | **0.000258** | **0.808** | **0.000143** | **0.535** |
123
 
124
 
125
  ---
@@ -203,11 +203,14 @@ However, the model tends to underperform on styles that were underrepresented in
203
  This limitation mainly stems from the uneven tag distribution within the dataset, certain instruments and genres are simply less present.
204
  In addition, the tagging tool (CLAP), trained on general-purpose music datasets like LAION-Audio-630K, is not specialized for specific genres such as trap or hip-hop, leading to imprecise tagging of elements like snares, hi-hats, or 808 bass.
205
  As a result, these styles are harder for the model to reproduce accurately.
 
206
 
207
  # Perspectives
208
 
209
  I'd like to fine tune over only 2-3 more epoch of a smaller dataset that represent better underrepresented styles.
210
  It'd be interesting to start over with a CLAP specialized on trap/rap genres.
 
 
211
  I’m open to any feedback or suggestions on my work.
212
 
213
  ## Sources
 
67
 
68
  | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
69
  |:--|:--|:--|:--|:--|:--|
70
+ | **100.45** | **2540.28** | **0.000284** | **1.412** | **0.0000585** | **0.523** |
71
 
72
  ---
73
 
 
76
 
77
  | Stable Audio Open 1.0 | StableBeaT |
78
  |:--|:--|
79
+ | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/1784661836.wav"></audio> | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/dreamt_14/1784661836.wav"></audio> |
80
 
81
  | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
82
  |:--|:--|:--|:--|:--|:--|
83
+ | **148.02** | **4287.26** | **0.00179** | **2.963** | **0.000195** | **0.552** |
84
 
85
  ---
86
 
 
93
 
94
  | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
95
  |:--|:--|:--|:--|:--|:--|
96
+ | **82.72** | **1056.42** | **0.000046** | **0.645** | **0.000089** | **0.478*** |
97
 
98
  ---
99
 
 
106
 
107
  | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
108
  |:--|:--|:--|:--|:--|:--|
109
+ | **144.2** | **2458.5** | **0.000356** | **0.738** | **0.00206** | **0.363** |
110
 
111
  ---
112
 
 
115
 
116
  | Stable Audio Open 1.0 | Stable BeaT |
117
  |:--|:--|
118
+ | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/stable-audio-1/1121349264.wav"></audio> | <audio controls src="https://huggingface.co/gab-gdp/sao-finetuned-trap-rap-beat/resolve/main/results/dreamt_14/1121349264.wav"></audio> |
119
 
120
  | BPM | Spectral Centroid | Spectral Flatness | Harmonic/Percussive Ratio | Transient Sharpness | CLAP Prompt Score |
121
  |:--|:--|:--|:--|:--|:--|
122
+ | **130.81** | **1000.87** | **0.000166** | **0.679** | **0.000007288** | **0.250** |
123
 
124
 
125
  ---
 
203
  This limitation mainly stems from the uneven tag distribution within the dataset, certain instruments and genres are simply less present.
204
  In addition, the tagging tool (CLAP), trained on general-purpose music datasets like LAION-Audio-630K, is not specialized for specific genres such as trap or hip-hop, leading to imprecise tagging of elements like snares, hi-hats, or 808 bass.
205
  As a result, these styles are harder for the model to reproduce accurately.
206
+ I also noticed that the generated melodic elements, like piano or synths, often sound much quieter than the drums, since their frequencies are more subtle.
207
 
208
  # Perspectives
209
 
210
  I'd like to fine tune over only 2-3 more epoch of a smaller dataset that represent better underrepresented styles.
211
  It'd be interesting to start over with a CLAP specialized on trap/rap genres.
212
+ Also interested about noise input conditioning such as [**SpecGrad**](https://arxiv.org/pdf/2203.16749).
213
+
214
  I’m open to any feedback or suggestions on my work.
215
 
216
  ## Sources