the release and research cycles are still contracting

Not necessarily progress or benchmarks that as a broader picture you would look at (MMLU etc)

GPT-3 was an amazing step up from GPT-2, something scientists in the field really thought was 10-15 years out at least done in 2, instruct/RHLF for GPTs was a similar massive splash, making the second half of 2021 equally amazing.

However nothing since has really been that left field or unpredictable from then, and it's been almost 3 years since RHLF hit the field. We knew good image understanding as input, longer context, and improved prompting would improve results. The releases are common, but the progress feels like it has stalled for me.

What really has changed since Davinci-instruct or ChatGPT to you? When making an AI-using product, do you construct it differently? Are agents presently more than APIs talking to databases with private fields?

In some dimensions I recognize the slow down in how fast new capabilities develop, but the speed still feels very high:

Image generation suddenly went from gimmick to useful now that prompt adherence is so much better (eagerly waiting for that to be in the API)

Coding performance continues to improve noticeably (for me). Claude 3.7 felt like a big step from 4o/3.5. Gemini 2.5 in a similar way.compared to just 6 months ago I can give bigger and more complex pieces of work to it and get relatively good output back. (Net acceleration)

Audio-2-audio seems like it will be a big step as well. I think this has much more potential than the STT-LLM-TTS architecture commonly used today (latency, quality)

I see a huge progress made since the first gpt-4 release. The reliability of answers has improved an order of magnitude. Two years ago, more than half of my questions resulted in incorrect or partially correct answers (most of my queries are about complicated software algorithms or phd level research brainstorming). A simple “are you sure” prompt would force the model to admit it was wrong most of the time. Now with o1 this almost never happens and the model seems to be smarter or at least more capable than me - in general. GPT-4 was a bright high school student. o1 is a postdoc.

Excuse the pedantry; for those reading, it’s RLHF rather than RHLF.