I've been building systems like what the OP is using since gpt3 came out.

This is the honeymoon phase. You're learning the ins and outs of the specific model you're using and becoming more productive. It's magical. Nothing can stop you. Then you might not be improving as fast as you did at the start, but things are getting better every day. Or maybe every week. But it's heaps better than doing it by hand because you have so much mental capacity left.

Then a new release comes up. An arbitrary fraction of your hard earned intuition is not only useless but actively harmful to getting good results with the new models. Worse you will never know which part it is without unlearning everything you learned and starting over again.

I've had to learn the quirks of three generations of frontier families now. It's not worth the hassle. I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months. Copy and paste is the universal interface and being able to do surgery on the chat history is still better than whatever tooling is out there.

Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.

First off, appreciate you sharing your perspective. I just have a few questions.

> I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months.

Can you expand more on what you mean by that? I'm a bit of a noob on llm enabled dev work. Do you mean that you will kick off new sessions and provide a context that you manage yourself instead of relying on a longer running session to keep relevant information?

> Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.

I appreciate your insight but I'm failing to understand how exactly knowing these tools increases performance of llms. Is it because you can more precisely direct them via prompts?

LLMs work on text and nothing else. There isn't any magic there. Just a limited context window on which the model will keep predicting the next token until it decides that it's predicted enough and stop.

All the tooling is there to manage that context for you. It works, to a degree, then stops working. Your intuition is there to decide when it stops working. This intuition gets outdated with each new release of the frontier model and changes in the tooling.

The stateless API with a human deciding what to feed it is much more efficient in both cost and time as long as you're only running a single agent. I've yet to see anyone use multiple agents to generate code successfully (but I have used agent swarms for unstructured knowledge retrieval).

The Unix tools are there for you to progra-manually search and edit the code base copy/paste into the context that you will send. Outside of Emacs (and possibly vim) with the ability to have dozens of ephemeral buffers open to modify their output I don't imagine they will be very useful.

Or to quote the SICP lectures: The magic is that there is no magic.

I can't speak for parent, but I use gptel, and it sounds like they do as well. It has a number of features, but primarily it just gives you a chat buffer you can freely edit at any time. That gives you 100% control over the context, you just quickly remove the parts of the conversation where the LLM went off the rails and keep it clean. You can replace or compress the context so far any way you like.

While I also use LLMs in other ways, this is my core workflow. I quickly get frustrated when I can't _quickly_ modify the context.

If you have some mastery over your editor, you can just run commands and post relevant output and make suggested changes to get an agent like experience, at a speed not too different from having the agent call tools. But you retain 100% control over the context, and use a tiny fraction of the tokens OpenCode and other agents systems would use.

It's not the only or best way to use LLMs, but I find it incredibly powerful, and it certainly has it's place.

A very nice positive effect I noticed personally is that as opposed to using agents, I actually retain an understanding of the code automatically, I don't have to go in and review the work, I review and adjust on the fly.

One thing to keep in mind is that the core of an LLM is basically a (non-deterministic) stateless function that takes text as input, and gives text as output.

The chat and session interfaces obscure this, making it look more stateful than it is. But they mainly just send the whole chat so far back to the LLM to get the next response. That's why the context window grows as a chat/session continues. It's also why the answers tend to get worse with longer context windows – you're giving the LLM a lot more to sift through.

You can manage the context window manually instead. You'll potentially lose some efficiencies from prompt caching, but you can also keep your requests much smaller and more relevant, likely spending fewer tokens.

> I've been building systems like what the OP is using since pgt3 came out.

OP is also a founder of Hashicorp, so.. lol.

> This is the honeymoon phase.

No offense but you come across as if you didn’t read the article.

You come across as if you didn't read my post.

I'll wait for OP to move their workflow to Claude 7.0 and see if they still feel as bullish on AI tools.

People who are learning a new AI tool for the first time don't realzie that they are just learning quirks of the tool and underlying and not skills that generalize. It's not until you've done it a few times that you realzie you've wasted more than 80% of your time on a model that is completely useless and will be sunset in 6 months.