This “short leash” seems like more of a crutch to me, and a sign of not giving the AI enough detail on the problem to begin with, or not reviewing and iterating on its output.

Hand-holding great models like Fable through implementation is a waste of time, and a waste of Fable. You can have increasingly nuanced discussions with stronger models, and they write a lot better code than they used to. The process of discussing designs and their implementations, questioning things that look weird to you, and actually reading the AI’s responses also helps to find better solutions.

For example, one time I wanted to write a greedy solver for a problem, and in my discussion with Opus on the idea it suggested using an existing MILP library to solve the problem exactly. I’d never even heard of MILP, but my final implementation ended up being better and simpler than what I’d have done alone.

You say you can have increasingly nuanced discussions with stronger models.

What I say is, when I asked Claude why he applied a certain change I didn't understand, and boy, it was a small change, he said he "reasoned from first principles" based on the code paths. But it didn't work, and when I asked, "Okay, describe the steps of your reasoning from first principles," it literally answered that it had just made it up.

So, nuanced discussions with models, I don't buy it.

You can never ask why a model did a certain thing, or what it was "thinking" when it said something - just like you can't ask a human which neurons were firing when they had a certain thought. The information just isn't available at that level.

You absolutely can have deep nuanced discussions with LLMs however, you just need to better understand their strengths and weaknesses.

A human won't respond with "Neuron 10-100 of the frontal cortex" (jokes aside) with deceptively convincing confidence.

The human will quite convincingly be able to construct a post-hoc reasoning on an action that may or may not be related at all to what was actually going through their head or the actual instinctual reasons that led to a decision.

Humans can accurately retell what their consciousness was doing, but they have no clue why their unconsciousness responded as it did.

LLM is just that unconsciousness part that humans have to post hoc explain like that, and lacks the conscious part that we humans actually can inspect in ourselves.

If the AI had some introspection part where it actually tracks its reasoning maybe it would be closer to conscious humans. Its too expensive to do that everywhere ofc, not even us humans tracks everything like that, just a tiny bit, but tracking that tiny bit is enough for so much error correction to happen.

"Humans can accurately retell what their consciousness was doing" is often not true, because of complex mechanisms. The feeling of shame alone can make it very hard for someone to accurately describe how the arrived at the wrong conclusion.

Plus it's an open question if this is even a thing. Does consciousness consist of constructing actions beforehand, or of construction justifications afterward?

Frankly, my opinion is that DNA is incredible at choose the most energy efficient/cheap option, and the cheaper option is definitely justifications afterward.

I feel strengthened by psychological experiments where people are shown fake events involving them, where they then "explain their (nonexistent) reasoning at the time".

Arguments for the idea that the human consciousness/soul is something that is emergent keep getting shouted down though. Even though if you take the extreme opposite: it's obviously wrong. Nobody has ever cut open a human skull (or anything else) and found a soul. So somehow it's constructed from very non-conscious components we don't understand, it's not "actually there" in a real sense.

Sufficiently constrained post-hoc justifications are indistinguishable from explanations. Consciousness tries to make things up, it learns that people notice this, it then begins trying to construct justifications that won't be predictably called out as false. Eventually it learns how its unconscious operates, and how to interrogate it, and its post-hoc justifications, at least in the common cases, become reliable.

>Consciousness tries to make things up, it learns that people notice this, it then begins trying to construct justifications that won't be predictably called out as false.

There's a logical "skip" between that and

>Eventually it learns how its unconscious operates, and how to interrogate it, and its post-hoc justifications, at least in the common cases, become reliable.

The brain constructs a narrative that won't be called out as false, one that provides social capital, makes one feel good about oneself, is consistent with all your other justifications, etc. It's only an assumption that this process would naturally converge on Truth, and considering it's massively-multiplayer chaos where brains coordinate their stories in complex ways, my assumption is that this would converge on *stability*, not truth.

Yep. It converges on truth unless there's a strong reward for lies because truth is easy. It's a neural network. It just reads off/probes the internal state because that's the cheapest way to model the unconscious. The justification won't necessarily be true, mind, in terms of the labels it puts, but it should mostly be true structurally- behaviorally predictive in the ordinary domain.

(Even if you are incentivized to lie and flatter yourself, it is still helpful to have access to the true signal internally, because that way you can know how to structure your lie to best avoid detection.)

>Eventually it learns how its unconscious operates

I mean, no we don't, both in a personal way and in a global scientific understanding.

What you're saying happens is a set of socially consistent and acceptable responses based upon general human knowledge at the time. The common cases aren't exactly reliable, it's that they are repeatable in the sense they cover what we expect, and tend to explode when the world is less predictable.

This is why the scientific method changed the world, because we started writing shit down, comparing notes, and striving for repeatability.

I think a better way of putting this is that humans think they can accurately re-tell what their consciousness was doing. Whether they actually can, or even if consciousness exists at all as a thing outside the perception of consciousness is a philosophical question currently beyond answering.

I wonder if monte carlo tree search could play a role in reasoning. I'm searching and it seems to come up in arxiv papers, so the idea is not dead. I'll look more into this after writing this comment..

> Humans can accurately retell what their consciousness was doing

Can they? How could we possibly know this is the case? People could simply post-hoc rationalize this to justify whatever decision they made.

That's exactly what the LLM seems to have done as well. The problem is that we want and even expect the A.I to be truthful.

Isn’t that part of what the think blocks are for? Yea, don’t inject them back into the context, but do log them for review of that train of thought… no?

You don't get access to the thinking traces. Might work with local models tho, but the current <thinking/> meta isn't particularly suited for this either, as it's a big blob of rambling surfaced by RL, with the "only" objective being that the thinking blob somehow leads to a better final answer. Something more detailed, using templates akin to oAI's harmony could work, provided there's also a step that teaches the models to reflect on the various thinking channels, and maybe surface bits and pieces to include in "skills" or "learnings".

That's true, but it does mean that the LLM itself actually does have access to those thinking traces and could therefore, at least in principle, answer what it was thinking. They're probably not trained to do that, though.

It depends. Up until recently the models were trained only to "think" on the last user message. So you'd send the message1, got back reply1 w/ think1 but you'd make the next iteration m1 - r1 - m2, and would get back reply2 w/ think2. You would not add the thinking1. That's how the models were trained, and that's how you were supposed to construct the conversation.

Now recently some things have changed, and you can add the thinking part (you get that encrypted from the closed API labs). But the model needs to have been trained for this to work. And doing it this way you'll burn through tokens faster, as the thinking parts are usually rather long.

You certainly can ask it what it was thinking, the problem is just that it's more likely to make up a plausible sounding fabrication than to say "I don't know" or "my reasoning is hidden for business reasons" (frontier models hide a lot of their chain of thought). Which is the fundamental problem with LLMs though, if the data doesn't exist or it's sparse they make things up.

Choosing plausible sounding fabrication over an admission of ignorance is not an uncommon modality among the human beings I interact with, so I'm not surprised this pattern is found in models trained on human interactions.

Totally fine. Then let's just not pretend these "AI"s are somehow better at it.

That's the whole problem with all of these discussions. It's whataboutism and "You're holding it wrong" allegations.

So you're saying I can absolutely have a deep, nuanced discussion with an LLM, as long as I don't ask how he arrived at his conclusions?

You can also have a deep nuanced discussion with a rubber duck as long as you don't ask any questions it needs to respond to.

Have you not seen all the posts with claims that AI lies about its reasoning when asked to explain how it arrived at the output?

I would instead ask the model to explain how X works, whether it achieves Y, and why we cannot do Z instead.

That is how you have a discussion with the AI.

You can have a nuanced discussion with an LLM. But LLMs also have failure modes where they start making up justifications. The two are not mutually exclusive.

>as long as I don't ask how he arrived at his conclusions?

So just the average US political discussion with a human then?

[deleted]

> You can never ask why a model did a certain thing

Of course you can! It might be following outdated docs or read something in legacy code and tried to follow that pattern and it'll tell you as much if you ask it in a way that actually gets you the reason instead of it thinking it needs to immediately fix the mistake.

Dude, these two things are not at all analogous:

1. Asking a model why it did a certain thing, and

2. Expecting a human to say which neuron fired in their response.

Even asking a human being why they did a certain thing is questionable. The research on choice blindness seems like a pretty definitive debunking of post-hoc rationalization:

https://en.wikipedia.org/wiki/Introspection_illusion#Choice_...

I'm not sure what point you're trying to make. In science and engineering, being able to provide justification is a core skill. The comparison we should be making is against the human practitioners who are trained in their fields. There will always be a distribution of ability. Saying that there's evidence that people are capable of providing post-hoc rationalization doesn't say anything about the ability of experts to produce well thought out responses (in their respective fields) that don't immediately fall apart under scrutiny.

Structured thinking and deliberation are indeed important, but you can also make LLMs do structured "thinking" if you work hard enough, and generate quite plausible reasoned arguments with valid real-world results, and you can get them to write down their working as they go. But as research has shown, it's not "true" thinking, just pattern matching at a higher level, and eventually runs out of steam.[0]

But you only have to drill down a couple more layers and you are back in the void again; do you have any proof that your own thinking, no matter how structured and accurate, is anything other than pattern-matching at a sufficiently much higher level at which you are incapable of seeing it as such?

I think we will be finding some very interesting things out soon using the combination of LLMs and theorem provers, as demonstrated by Terence Tao's recent work.[1]

A cheetah is not a motorbike is not an aircraft is not a rocket.

[0] https://arxiv.org/abs/2506.06941

[1] https://arxiv.org/abs/2603.12744

"Nuanced discussion" doesn't necessarily mean the sort one would have with a human. Statistical apologies are never going to be meaningful. One could edit nonsense into the context window and the model would attempt to rationalize it. The models are smart but you need to use them in a way that makes sense for what they are.

"Nuanced discussions" is more about describing a design to a model, asking the model to critique your design and ask you for clarifications, and then you providing those clarifications and the model "getting it" and proceeding to additional levels of detail before implementation. In particular the models being able to highlight concerns you have not yet thought about is a pretty good sign of this. Fable is noticeably better at this compared to Opus.

I was not talking about models making mistakes. Mistakes, and then models making up justifications for those mistakes, is a failure mode of any LLM, and Fable is no different in that regard. Newer models might make less mistakes, or at least make less egregious mistakes, but they still make mistakes.

Posts like this are meaningless without more context - the model you're using, the harness, the initial prompt and context.

Fable is better than most staff engineers at my FAANG.

Maybe I’m missing something, but he talks about charm and tasks (repos on his GitHub). Charm being his harness, and tasks being one of his skills. Idk, maybe I’m mistaken from reading the article…

https://github.com/taoeffect

> Fable is better than most staff engineers at my FAANG.

While this wouldn’t entirely surprise me, my experience is just not that. Using Claude and fable, it regularly (poorly) recreates features that exist inside our codebase. Sure, I could give way more initial context but at a certain point I’ve given so much context that I would have been faster writing the code myself, or I could have literally handed it to even a fresh graduate to write.

> Fable is better than most staff engineers at my FAANG.

That’s genuinely disturbing.

But staff engineers take "responsibility"

Including you?

Fable will definitely be the one on call when it inevitably breaks down from the pile of shit slop it wrote at 5AM, don't worry <3

We already use AI for oncall and it works better than our humans most of the time.

> he

:/

[flagged]

We can point out mistakes that feel rather grating without assuming intent behind them.

I agree that their use of "he" is likely because they're not a native speaker, especially because they're arguing against the capabilities of LLMs.

That doesn't make it inherently wrong to point out the mistake when it's so intertwined with the deeper discussion here, especially given the fact that some (hopefully few) people do build relationships with LLMs.

> turd bucket autist

I’d be more willing to engage with your argument in good faith without inflammatory language like this. Try and meet people where they are and these conversations become easier.

[deleted]

[flagged]

I’d prefer kindness and good faith when talking to strangers, but maybe my expectations are too high.

Do you think you’ll change someone’s mind by being an asshole? Rarely works.

You’re in agreement with me, call these toxic language police jerks out as soon they have keyboard spasms.

[deleted]

That may be true but it’s still capable of nuanced discussions.

I tend to agree,

If you have invested significantly in the planning phase and there is momentum in the architecture and conventions that already exist in the project, the implementation phase might not need as much oversight as is suggested here.

> You can discover that your initial idea was dumb and a better one exists

The planning and architecture phase is usually where I make these types of discovery at a high level.

> Your agent might go “off the rails” and start doing something you don’t want it to do

Candidly these orthogonal, inadvertent edits aren't as bad as they once were and for impactful changes there should be at least some test coverage, even if that test coverage is just "freezing" what was implemented.

As you mentioned the final review discussion is a good chance to verify beyond what review or adversarial review agents find.

I think the obvious solution here is to beef up the test side of the app, much more than when writing code by hand. Tests represent project knowledge in executable format. The LLM does not need to be careful to remember every detail of the tests. You don't need to vet every small interaction, it automates review work as well.

Even better if the project was built from the start to be easier to test and observe. But my golden rule remains - no code without tests, expand test suite all the time.

I agree, human-steered, AI-implemented test cases can at least capture the acceptance criteria.

It's then more efficient to inspect if existing test cases are being modified as part of the delivery of something new and inspect why.

I am a bit confused which part you disagree with specifically. Reading AI responses and reviewing code seems to be what you propose as well.

Your example with MLIP is something that would not be prevented by this approach, during the planing phase, it would surface.

I guess the devil is in the details and the way you prompt it for starting the task matters.

But IMO you absolutely need to check the output, need to engage with what the model is doing, need to probe why something is built the way the model tries to build it.

I disagree with keeping an eye on the model as it is working, approving every command, and denying and stopping the model when you think it has gone wrong. It is not that it is actively harmful to do this, but rather that it is a waste of time and you can avoid the need for it through better design discussions and review.

Micro-managing and keeping the AI on a "short leash" also lends itself better to telling models to do smaller units of work at a time instead of discussing broader design concerns. That is why I think someone doing this would miss the MILP solution, because they might never discuss the overall design with the model but rather just tell it what to implement next.

I personally am somewhere between you and the author. I don't check _all_ the intermediary steps, but I do try to understand what it's doing [1] and follow the process. Mostly I let it do the changes itself without supervision at each step but when a coherent "chunk" of work is done, I go through it really thoroughly. In almost 90% of the cases after a chunk is done some adjustments are needed.

I find broad architectural design to be _better_ if you follow along in the process because you better understand the direction it's going earlier and you can shift the high level direction much earlier. Even if you check its steps, you can ask it for its take on high-level architectural aspects along the way, no problem. I think personal touch matters a lot though, because I naturally ask it and try to get the big picture image.

[1] I actually find it really instructive what tooling it uses to tackle a problem, I got to become a much better console user because of it

I agree. Better to let it rip in a sandbox then spend your time correcting the finished product.

Waste of time being in the middle.

The article feels like micromanaging AI. If you think about it like a junior employee, micromanaging them will mean they end up doing the work you want and do it your way. But they won't bring any of their ideas to the table, which in the long run could be beneficial to everyone on the team.

This is the method I use.

It makes sure that I understand everything being generated and that I maintain a firm working knowledge of the codebase at all times.

I can easily steer it too.

[deleted]

Do you have a background in CS or optimization? MILP is a pretty standard concept in algorithms/optimization. So this example doesn't really convince me that the AI reached some unusually superior conclusion. It sounds more like it suggested a well-known technique that you personally hadn't encountered. Useful, yes, but that seems more about background knowledge gaps than about the merits of letting the tool run unconstrained.

There are always concepts that some people think are a basic, that others haven't heard of. The entire benefit here is that AI can point out what we miss. There are certainly techniques you don't know about, or just didn't think to apply to a problem, that others would find to be pretty standard.