We see the same with Google's Flash models. It's easier to make a small capable model when you have a large model to start from.

Flash models are nowhere near Pro models in daily use. Much higher hallucinations, and easy to get into a death sprawl of failed tool uses and never come out

You should always take those claim that smaller models are as capable as larger models with a grain of salt.

Flash model n is generally a slightly better Pro model (n-1), in other words you get to use the previously premium model as a cheaper/faster version. That has value.

They do have value, because they are much much cheaper.

But no, 3.0 flash is not as good as 2.5 pro, I use both of them extensively, especially in translation. 3.0 flash will confidently mistranslate some certain things, while 2.5 pro will not.

Totally fair. Translation is one of those specific domains where model size correlates directly with quality, and no amount of architectural efficiency can fully replace parameter count.