Correct me if I'm wrong but I believe you're interpreting this as if the rails app needs the same amount of is that exists in the described vite solution.
But in reality if you're using hotwire you can get away with almost no JavaScript at all comparatively. That's why stimulus is in vanilla js generally, it's meant for sprinkling behavior onto the dom vs controlling the dom.
So if you don't have a js framework that needs to control the whole Dom and doesn't need a gigantic optimization step or tree shaking or typescript or whatever, you can get away with a whole lot less than it you embraced those frameworks that _do_ want to own the dom wholesale.