hey mate! thanks for your feedback.
indeed, I'm running to two problems on the analyzer side: 1. align model sliding off (especially w/ chorus/back vocals present) 2. transcript skipping parts of lyrics in lyrics-heavy tracks (I tried a lot of russian rap, lol)
happy for contributions as I'm not that experienced w/ machine learning side of the project, mostly it was emperical "tweak the parameters and look what is changed"
also model only affects the transcript job (I need to make it clearer in the UI). For the alignment, it's a single model provided by whisperx