I agree. But notice that you assume that there is a metric with which you can messure improvement. Which is fine if you are measuring against your personal taste.

But it might be that the optimization target itself has a ceiling. If you're training toward human approval ratings from a broad population, you converge toward what median preference selects for. The plateau is baked into what you're measuring against.