They seem to be optimizing for benchmarks instead of real world use

Yeah if only Gemini performed half as well as it does on benches, we'd actually be using it.