I think if you honestly don’t believe there is a major difference between 3.x and 4.7 I don’t think there is much anyone will be able to do to convince you. I do find it disappointing when technical professionals are so disinterested in building a real understanding of a fairly complex topic.

> I see no reason to believe it's going to get better. Waving hands more forcefully isn't helping, there's no argument behind the promise of "it will get better".

That’s a real bummer to read that from someone who sounds like a professional, and not only a professional but someone thoughtful and smart. 30 years of brilliant work in RL, Bayesian stats, machine learning, measurement, and then trillions of dollars of funding and some of the best talent in the world, and your assertion is “I tried it on my codebase and I didn’t like it and that trumps literally entire fields of mathematics and statistics”. I mean, have you heard of Chinchilla scaling laws? Do you know how RL works? are you aware of benchmarks, their strengths and weaknesses? Are you following adoption numbers, accomplishments like new proofs of unsolved erdos problems?

> But, more importantly, the AI is applied on a level where really important things don't happen. It's automating boilerplate work.

Your experiences are your experiences, I don’t know what work you do or how it gets done, what languages you’re working with etc. but literally we’re at the point where the vast majority of code at major tech companies is fully AI written (not assisted).

> It's not the kind of decision it's called to make and nobody sensible would trust it to make such a decision because there isn't a clear right or wrong answer

What are you claiming is not fundamentally possible for an AI to do that a human can do here? People make judgement calls on ambiguous problems, taking into account vast amounts of context about the business, dev time, reliability, maintenance, etc; why do you think AI can’t do that?

What's up with the buzzword bragging?

You don't know buzzword A, B, C? Heh, he must be incompetent and know nothing.

The buzzwords mean nothing, really. The math is the same for a stupid or a smart model, because the model is trying to mimic properties of the training dataset.

You can give me the ultimate model architecture that will beat every model in existence and I can still figure out a way to make it perform worse than what's available today, but you're not even doing that, you're just drumming up some old news.

If someone "threatened" me with tech advancements I would be more worried about things like an imminent massive drop in token costs for bigger context windows or other game changers like continual learning where the model internalizes your code base into its weights rather than just keeping it in its context.

It’s not buzzword bragging they are the prerequisites to having an coherent conversation. If someone doesn’t know what chinchilla scaling laws are the discussion about “I think things are saturated” is not grounded in anything. It’s like sitting around debating quantum mechanics and you don’t know the math, it’s just meaningless. If these sound like buzzwords the implication is not “you’re an idiot” it’s “you are not yet informed on the key basics of the discussion” and that is something you can fix with curiosity and a couple of prompts to ChatGPT to speed up the learning curve. It’s not like any of this stuff is gatekept.

> You can give me the ultimate model architecture that will beat every model in existence and I can still figure out a way to make it perform worse than what's available today, but you're not even doing that, you're just drumming up some old news.

Sorry I don’t understand what you’re saying here — what is the old news? You can break new models — yes. What’s the point you are trying to make here?

> If someone "threatened" me with tech advancements I would be more worried about things like an imminent massive drop in token costs for bigger context windows or other game changers like continual learning where the model internalizes your code base into its weights rather than just keeping it in its context.

I also don’t really know the point you’re trying to make here — like token cost drops seem like a good thing? Bigger context window too? Are we saying the same thing here?