An ecosystem is being built around AI : Best prompting practices, mcps, skills, IDE integration, how to build a feedback loop so that LLM can test its output alone, plug to the outside world with browser extensions, etc...
For now i think people can still catch up quickly, but at the end of 2026 it's probably going to be a different story.
Okay, end of 2026 then what? No one ever learns how to use the tools after that? No one gets a job until the pre-2026 generation dies?
For now i think people can still catch up quickly, but at the end of 2027 it's probably going to be a different story.
> probably going to be a different story
Can you elaborate? Skill in AI use will be a differentiator?
Yes.
At some point you will need to combine multiple skills together:
- communication
- engineering skills (understanding requirements, finding edge cases, etc)
- architectural proficiency
- prompting
- agentic workflows and skills
- context management
- and yes, proper old fashioned coding skills to keep things tidy and consistent
> Best prompting practices, mcps, skills, IDE integration, how to build a feedback loop so that LLM can test its output alone, plug to the outside world with browser extensions, etc...
Ah yes, an ecosystem that is fundamentally inherently built on probabilisitic quick sand and even with the "best prompting practices", you still get agents violating the basics of security and committing API keys when they were told not to. [0]
[0] https://xcancel.com/valigo/status/2009764793251664279
One of the skills needed to effectively use AI for code is to know that telling AI "don't commit secrets" is not a reliable strategy.
Design your secrets to include a common prefix, then use deterministic scanning tools like git hooks to prevent then from being checked in.
Or have a git hook that knows which environment variables have secrets in and checks for those.
That's such an incredibly basic concept, surely AIs have evolved to the point where you don't need to explicitly state those requirements anywhere?
They can still make mistakes.
For example, what if your code (that the LLM hasn't reviewed yet) has a dumb feature in where it dumps environment variables to log output, and the LLM runs "./server --log debug-issue-144.log" and commits that log file as part of a larger piece of work you ask it to perform.
If you don't want a bad thing to happen, adding a deterministic check that prevents the bad thing to happen is a better strategy than prompting models or hoping that they'll get "smarter" in the future.
Part of why these things feel "not fit for purpose" is that they don't include the things Simon has spent three years learning? (I know someone else who's doing multi-LLM development where he uses job-specialty descriptions for each "team member" that lets them spend context on different aspects of the problem; it's a fascinating exercise to watch, but it feels even more like "if this is how the tools should be used, why don't they just work that way"?)
Doesn't seem to work for humans all the time either.
Some of this negativity I think is due to unrealistic expectations of perfection.
Use the same guardrails you should be using already for human generated code and you should be fine.
I have tons of examples of AI not committing secrets. this is one screenshot from twitter? I don’t think it makes your point
CPUs are billions of transistors. sometimes one fails and things still work. “probabilistic quicksand” isn’t the dig you think it is to people who know how this stuff works
I have tons of examples of drivers not running into objects.
like my other comment, my point is one screenshot from twitter vs one anecdote. neither proves anything. cool snarky response though!
> I have tons of examples of AI not committing secrets.
"Trust only me bro".
It takes 10 seconds to see the many examples of API keys + prompts on GitHub to verify that tweet. The issue with AI isn't limited to that tweet which demonstrates its probabilistic nature; Otherwise why do need a sandbox to run the agent in the first place?
Nevermind, we know why: Many [0] such [1] cases [2]
> CPUs are billions of transistors. sometimes one fails and things still work. “probabilistic quicksand” isn’t the dig you think it is to people who know how this stuff works
Except you just made a false equivalence. CPUs can be tested / verified transparently and even if it does go wrong, we know exactly why. Where as you can't explain why the LLM hallucinated or decided to delete your home folder because the way it predicts what it outputs is fundamentally stochastic.
[0] https://old.reddit.com/r/ClaudeAI/comments/1pgxckk/claude_cl...
[1] https://old.reddit.com/r/ClaudeAI/comments/1jfidvb/claude_tr...
[2] https://www.google.com/search?q=ai+deleted+files+site%3Anews...
you could find tons of API keys on GitHub before these “agentic” tools too. that was my point, one screenshot from twitter vs one anecdote from me. I don’t think either proves the point, but posting a screenshot from twitter like it’s proof of some widespread problem is what I was responding to (N=2, 1 vs 1)
my point is more “skill issue” than “trust me this never happens”
my point on CPUs is people who don’t understand LLMs talk like “hallucinations” are a real thing — LLMs are “deciding” to make stuff up rather than just predicting the next token. yes it’s probabilistic, so is practically everything else at scale. yet it works and here we are. can you really explain in detail how everything you use works? I’m guessing I can explain failure modes of agentic systems (and how to avoid them so you don’t look silly on twitter/github) and how neural networks work better than most people can explain the technology they use every day
> you could find tons of API keys on GitHub before these “agentic” tools too. that was my point, one screenshot from twitter vs one anecdote from me. I don’t think either proves the point, but posting a screenshot from twitter like it’s proof of some widespread problem is what I was responding to (N=2, 1 vs 1)
That doesn't refute the probabilistic nature of LLMs despite best prompting practices. In fact it emphasises it. More like your 1 anecdotal example vs my 20+ examples on GitHub.
My point tells you that not only it indeed does happen, but a previous old issue is now made even worse and more widespread, since we now have vibe-coders without security best practices assuming the agent should know better (when it doesn't).
> my point is more “skill issue” than “trust me this never happens”
So those that have this "skill issue" are also those who are prompting the AI differently then? Either way, this just inadvertently proves my whole point.
> yes it’s probabilistic, so is practically everything else at scale. yet it works and here we are.
The additional problem is can you explain why it went wrong as you scale the technology? CPUs circuit design go through formal verification and if a fault happens, we know exactly why; hence it is deterministic in design which makes them reliable.
LLMs are not and don't have this. Which is why OpenAI had to describe ChatGPT's misaligned behaviour as "sycophancy", but could not explain why it happened other than tweaking the hyper-parameters which got them that result.
So LLMs being fundamentally probabilistic and are hence, more unexplainable being the reason why you have the screenshot of vibe-coders who somehow prompted it wrong and the agent committed the keys.
Maybe that would never have happened to you, but it won't be the last time we see more of this happening on GitHub.
I was pointing out one screenshot from twitter isn’t proof of anything just to be clear; it’s a silly way to make a point.
yes AI makes leaking keys on GH more prevalent, but so what? it’s the same problem as before with roughly the same solution
I’m saying neural networks being probabilistic doesn’t matter — everything is probabilistic. you can still practically use the tools to great effect, just like we use everything else that has underlying probabilities
OpenAI did not have to describe it as sycophancy, they chose to, and I’d contend it was a stupid choice
and yes, you can explain what went wrong just like you can with CPUs. we don’t (usually) talk about quantum-level physics when discussing CPUs; talking about neurons in LLMs is the wrong level of abstraction
> I was pointing out one screenshot from twitter isn’t proof of anything just to be clear; it’s a silly way to make a point.
Verses your anecdote being a proof of what? Skill issue for vibe coders? Someone else prompting it wrong?
You do realize you are proving my entire point?
> yes AI makes leaking keys on GH more prevalent, but so what? it’s the same problem as before with roughly the same solution
Again, it exacerbates my point such that it makes the existing issue even worse. Additionally, that wasn't even the only point I made on the subject.
> I’m saying neural networks being probabilistic doesn’t matter — everything is probabilistic.
When you scale neural networks to become say, production-grade LLMs, then it does matter. Just like it does matter for CPUs to be reliable when you scale them in production-grade data centers.
But your earlier (fallacious) comparison ignores the reliability differences between them (CPUs vs LLMs.) and determinism is a hard requirement for that; which the latter, LLMs are not.
> OpenAI did not have to describe it as sycophancy, they chose to, and I’d contend it was a stupid choice
For the press, they had to, but no-one knows the real reason, because it is unexplainable; going back to my other point on reliability.
> and yes, you can explain what went wrong just like you can with CPUs. we don’t (usually) talk about quantum-level physics when discussing CPUs; talking about neurons in LLMs is the wrong level of abstraction
It is indeed wrong for LLMs because not even the researchers can practically give an explanation why a single neuron (for every neuron in the network) gives different values on every fine-tune or training run. Even if it is "good enough", it can still go wrong at the inference-level for other unexplainable reasons other than it "overfitted".
CPUs on the other hand, have formal verification methods which verify that the CPU conforms to its specification and we can trust that it works as intended and can diagnose the problem accurately without going into atomic-level details.
…what is your point exactly (and concisely)? I’m saying it doesn’t matter it’s probabilistic, everything is, the tech is still useful
No one is arguing that it isn't useful. The problem is this:
> I’m saying it doesn’t matter it’s probabilistic, everything is,
Maybe it doesn't matter for you, but it generally does matter.
The risk level of a technology failing is far higher if it is more random and unexplainable than if it is expected, verified and explainable. The former eliminates many serious use-cases.
This is why your CPU, or GPU works.
LLMs are neither deterministic, no formal verification exists and are fundamentally black-boxes.
That is why many vibe-coders reported many "AI deleted their entire home folder" issues even when they told it to move a file / folder to another location.
If it did not matter, why do you need sandboxes for the agents in the first place?
I think we agree then? the tech is useful; you need systems around them (like sandboxes and commit hooks that prevent leaking secrets) to use them effectively (along with learned skills)
very little software (or hardware) used in production is formally verified. tons of non-deterministic software (including neural networks) are operating in production just fine, including in heavily regulated sectors (banking, health care)
> I think we agree then? the tech is useful; you need systems around them (like sandboxes and commit hooks that prevent leaking secrets) to use them effectively (along with learned skills)
No.
> very little software (or hardware) used in production is formally verified. tons of non-deterministic software (including neural networks) are operating in production just fine, including in heavily regulated sectors (banking, health care)
It's what happens when it all goes wrong.
You have to explain exactly why, a system failed in heavily regulated sectors.
Saying 'everything is probabilistic' as the reason for the cause of an issue, is a non answer if you are a chip designer, air traffic controller, investment banker or medical doctor.
So your point does not follow.
that’s not what I said. you honestly seem like you just want to argue about stuff (e.g. not elaborating on the “no” when I basically repeated and agreed with what you said). and you seem to consistently miss my point (in the second part of your response; I’m saying these non-deterministic neural networks are already widespread in industry with these regulations, and it’s fine. they can be explained despite your repeated assertions they cannot be. also the entire point on CPUs which you may have noticed I dropped from my responses because you seemed distracted arguing about it). this is not productive and we’re both clearly stubborn, glhf
> that’s not what I said. you honestly seem like you just want to argue about stuff (e.g. not elaborating on the “no” when I basically repeated and agreed with what you said). and you seem to consistently miss my point
I have repeated myself many times and you decide to continue to ignore the reliability points that inherently impede LLMs in many use-cases which exclude them in areas where predictability in critical systems is required in production.
Vibe coders can use them, but the gulf between useful for prototyping and useful for production is riddled with hard obstacles as such a software like LLMs are fundamentally unpredictable hence the risks are far greater.
> I’m saying these non-deterministic neural networks are already widespread in industry with these regulations, and it’s fine.
So when a neural network scales beyond hundreds of layers and billions of parameters, equivalent to a production-grade LLM, explain exactly how is such a black-box on that scale explainable when it messes up and goes wrong?
> they can be explained despite your repeated assertions they cannot be.
With what methods exactly?
Early on, I said formal verification and testing on CPUs for explaining when they go wrong at scale. It is you that provided absolutely nothing of your own assertions with the equivalent for LLMs other than "they can be explained" without providing any evidence.
> also the entire point on CPUs which you may have noticed I dropped from my responses because you seemed distracted arguing about it). this is not productive and we’re both clearly stubborn, glhf
You did not make any point with that as it was a false equivalence, and I explained why the reliability of a CPU isn't the same as the reliability of a LLM.