As has been said, actual evals are needed here.

Anecdotally, the worst and most common failure mode of an agent is when an agent starts spinning its wheels and unproductively trying to fix some error and failing, iterating wildly, eventually landing on a bullshit (if any) “solution”.

In my experience, in Typescript, these “spin out” situations are almost always type-related and often involve a lot of really horrible “any” casts.

Right, I've noticed agents are very trigger happy with 'any'.

I have had a good time with Rust. It's not nearly as easy to skirt the type system in Rust, and I suspect the culture is also more disciplined when it comes to 'unwrap' and proper error management. I find I don't have to explicitly say "stop using unwrap" nearly as often as I have to say "stop using any".

Experienced devs coming in to TypeScript are also trigger happy with 'any' until they work out what's going on. Especially if they've come from Javascript.

I’ve tried enforcing no-explicit-any just to have the agent disable the linter rule. I guess I didn’t say you couldn’t do that…

LLMs are minimizing energy to solve problems, and if they can convince the human to go away happy with 'any', so be it.

There's a fine line between gradient descent, pedantry, and mocking. I suspect we will learn more about it.

The question can be asked two ways:

(1) Are current LLMs better at vibe coding typed languages, under some assumptions about user workflow?

(2) Are LLMs as a technology more suited to typed languages in principle, and should RL pipelines gravitate that way?

This is why I have very specific ruleset and linting for my LLMs, not allowing any at all and other quality checks.

Is this a shareable ruleset? I would completely understand if not but I’m interested in learning new ways to interact with my tools.

I tend to have three layers of "rulesets", one general one I reuse across almost any coding task (https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313...), then language specific ones, and finally project specific ones. Concat them before injecting into the system prompt.

Second this method.

The one thing I would really recommend adding to your constraints is to Don't Repeat Yourself - always check if something already exists. LLMs like to duplicate functionality, even if it's included in their context.

Can I ask why you have asked it to avoid abstractions? My experience has been that the old rules, such as avoid premature abstraction or premature optimization, don't apply as cleanly because of how ephemeral and easy to write the actual code is. I now ask the LLM to anticipate the space of future features and design modular abstractions that maximize extensibility.

> Can I ask why you have asked it to avoid abstractions?

Some models like to add abstractions regardless of their usefulness (Google's models seems excessively prone to this for some reason), so ended up having to prompt it away so it lets me come up with whatever abstractions are needed. The rules in that gist is basically just my own coding guidelines put in a way that LLMs can understand them, when I program "manually" I program pretty much that way.

I have yet to find any model that can properly plan feature implementations or come up with designs that are proper, including abstractions, so that's something I do myself at least for now, the system prompts mostly reflect that workflow too.

> because of how ephemeral and easy to write the actual code is

The code I produce isn't ephemeral by any measure I understand that word, anything I end up using stays where it is until it gets modified. I'm not doing "vibe coding" which it seems you're doing, might need some different prompts for that.

Until the agent disables the linter rule without you noticing!

Yup. I've watched both Claude and especially Gemini get frustrated trying to deal with my pre-commit checks (usually mypy) and deciding to do `git commit -n` even though my rules tell explicitly, multiple times, that it's never okay to bypass the pre-commit checks.

I know you are joking, but I have them injected into the tools they use, they run automatically every time they run commands to write, update etc. I can configure those to block the file edits completely, or just as feedback every time after. This is restricted outside of codebase, but of course they could find a loophole to hack the whole thing though o they could just get frustrated and run a recursive loop script that would crash my computer :)

Setting up linting with noExplicitAny is essential. But that won’t stop them from disabling it when they can’t figure something out. They’re sneaky little bastards.

Any type should be forbidden for LLMs as in compiled typed languages.