In my experience, there's little difference between implementing individual functions between frontier models and SotA ~30B param models.
Once you have a coherent design (the hard part), you can feed it to a pretty small model and get basically the same quality.
They'll not one-shot, but they're faster and cheaper, so it still works out in your favor.
Plus you can do it locally...
I have a similar experience. However, when including code review, I think the GPT model is the most impressive