Azure recently discontinued the gpt-4.1 model. I had to move off of this model, and moving to any gpt-5* model was worse (higher failures & less accuracy), and more expensive. I had to rewrite the entire system from high school level prompts to lower elementary school level prompts using non-gpt models.
I would say models entered a bottleneck a long time ago. My personal opinion is now they are overfitting newer models on coding and "agentic" capabilities at great expense of general abilities in other domains.