How does this stack up against Whisper ctranslate2?
#12
by
1TBGPU4EVR
- opened
W ctrans2 has been my go-to for a while. I got confused seeing NV doing what looks like a PR from Qwen but 'sounds', ha, awesome. Diarization iirc is a hack at least for Whisper but I am more interested in the quality of this model, especially with using Qwen as a base as imho they have excellent quality inference. Of course I'll try it but wondering if y'all have tested it against the usual suspects. Qwen is an excellent multimodal base for anything especially 2.5VL. Qw2.5VL doesn't do audio out of the box so this was a real fine tune. I know it does long form video..hmm...Have you looked under the hood of GLM 4.*? Not to throw off the topic, just saying.
Ramble, over.
Thank you.