I think that over time people will start looking at AI-assisted coding the same way we now look at loosely typed code, or at (heavy) frameworks: it saves time in the short term, but may cause significant problems down the line. Whether or not this tradeoff makes sense in a specific situation is a matter of debate, and there's usually no obviously right or wrong answer.

Once the free money runs out, the AI cos may shift to making heavily verified code snippets with more direct language control. This will heavily simplify a lot of boilerplate instead of fairytales of some AGI coding wiz.

Isn't the boilerplate that "AI" is capable of generating becoming more and more dated with each passing day?

Are the AI firms capable of retraining their models to understand new features in the technologies we work with? Or are LLMs going to be stuck generating C.A. 2022 boilerplate forever?

No to the first question, and maybe with a lot of money for the second question.

In the 20 years I've been in the industry, boiler plate has dropped dramatically in the backend.

Right now, front end has tons of boiler plate. It's one of the reasons AI hassle such a wow factor for FE, trivial tasks require a lot of code.

But even that is much better than it was 10 years ago.

That was a long way of saying I disagree with your no.

FE has a lot of boilerplate only if you’re starting from scratch every single time. That’s why we had template systems and why we invented view libraries. Once you’ve defined your libraries, you just copy-paste stuff.

It seems like they should be able to “overweight” newer training data. But the risk is the newer training data is going to skew more towards AI slop than older training data.

There won't ever be newer training data.

The OG data came from sites like Stackoverflow. These sites will stop existing once LLMs become better and easier to use. Game over.

Every time claude code runs tests or builds after a change, it's collecting training data.

Has Anthropic been able to leverage this training data successfully?

I can't pretend to know how things work internally, but I would expect it to be involved in model updates.

You need human language programming-related questions to train on too, not just the code.

thats what the related chats are for?

I mean if people continue checking open source code into GitHub using those new features then they should be able to learn them just fine.

This is only true if there continues to be tremendous amounts of money/hardware/power available to perform the training, in perpetuity.

It really depends on the situation. I think there's an argument for generating in a lower level strongly typed language, where most of the work of writing the pointlessly verbose parts is eliminated, any errors are found by the compiler immediately, but it still leaves the option for handwritten optimizations when needed. Sort of how one can drop down to C in python for the parts that need more performance.