Even in languages like C, it's mostly voodoo in practice unless you're using undersized indices anyway—you surely don't have anywhere near SIZE_MAX/2 elements.

The underlying problem is sloppy use of int for indices. The roundabout calculation will still overflow at twice the size.

For scalar code that's fair, but keep in mind that with `gather`, you can vectorize it, and there you do worry about integer width as more width means fewer lanes. Though tbf you probably should use B+ Tree structures chunked to your bulk/cache fetch "line/page" size, and vectorized higher-than-radix-2 search within each node.