"Smart, invisible regex" sounds like a lot of bs... could you give a more technical explanation?

Also the Whisper model doesn't really have a context window, it already segments the audio with a certain amount of overlap between the chunks, I really have a hard time understanding what you are trying to say here.

Whisper will fail > 99%* (edit, most of the time) of the time at lengths over 90 minutes and fairly high over one hour.

This is absolutely not my experience. I regularly (weekly at least) use whisper for 90-120 minutes pieces of content and only rarely have problems.

This is just plain wrong. I have my own Whisper App in the AppStore (on iOS, with very limited memory capacity) and there are no problems at all with longer Audio / Video files.

I've never had whisper complete a single attempt a anything over 75 min

Can't really declare that without declaring which whisper model in particular you are referring to, as there are a number of them

I’ve used whisper-cop on 5-hour podcasts without problems.

Would also love to hear what you mean by “smart invisible regex,” sounds like AI slop to me.