Slowing down with respect to original speed of response. Basically what we used to get few months back and what is the best possible experience.

There is no "original speed of response". The more resources you pour in, the faster it goes.

Watch them decrease resources for the normal mode so people are penny pinched into using fast mode.

Seriously, thinking at the price structure of this (6x the price for 2.5x the speed, if that's correct) it seems to target something like real time applications with very small context. Maybe vocal assistants? I guess that if you're doing development it makes more sense to parallelize over more agents rather than paying that much for a modest increase in speed.