So, can it handle multiple languages in one video, or do you need to segment the different languages using LID first? This has been a thorny issue for people working in multilingual audio (there are at least two or three of us).

I haven't test that specific edge case, I'm sorry. I tested 2 langue's having a normal conversation and that worked fine- "Auto or English" handle multiple lan the best