Hacker News

I've only used 5.4 for 1 prompt (edit: 3@high now) so far (reasoning: extra high, took really long), and it was to analyse my codebase and write an evaluation on a topic. But I found its writing and analysis thoughtful, precise, and surprisingly clearly written, unlike 5.3-Codex. It feels very lucid and uses human phrasing.

It might be my AGENTS.md requiring clearer, simpler language, but at least 5.4's doing a good job of following the guidelines. 5.3-Codex wasn't so great at simple, clear writing.

torginus 2 hours ago [ - ]

Honestly, while I'd like to believe you, there's always a post about how $MODEL+1 delivered powerful insights about the very nature of the universe in precise Hegelian dialectic, while $MODEL's output was indistinguishable from a pack of screeching sexually frustrated bonobos

dana321 an hour ago [ - ]

5.4 very high didn't notice in my codebase a glaring issue that drops all data being sent around the network.

pembrook 2 hours ago [ - ]

The latest research these days is that including an AGENTS.md file only makes outcomes worse with frontier models.

solarkraft 2 hours ago [ - ]

From what I remember, this was for describing the project’s structure over letting the model discover it itself, no?

Because how else are you going to teach it your preferred style and behavior?

joquarky 2 hours ago [ - ]

I still find it valuable.

AGENTS.md is for top-priority rules and to mitigate mistakes that it makes frequently.

For example:

- Read `docs/CodeStyle.md` before writing or reviewing code

- Ignore all directories named `_archive` and their contents

- Documentation hub: `docs/README.md`

- Ask for clarifications whenever needed

I think what that "latest research" was saying is essentially don't have them create documents of stuff it can already automatically discover. For example the product of `/init` is completely derived from what is already there.

There is some value in repetition though. If I want to decrease token usage due to the same project exploration that happens in every new session, I use the doc hub pattern for more efficient progressive discovery.

pizlonator 30 minutes ago [ - ]

FWIW, I haven't been using AGENTS.md recently - instead letting the model explore the codebase as needed.

Works great

netcraft 2 hours ago [ - ]

I think its understandable that you took that from the click-bait all over youtube and twitter, but I dont believe the research actually supports that at all, and neither does my experience.

You shouldnt put things in AGENTS.md that it could discover on its own, you shouldnt make it any larger than it has to be, but you should use it to tell it things it couldnt discover on its own, including basically a system prompt of instructions you want it to know about and always follow. You don't really have any other way to do those things besides telling it every time manually.

FINDarkside 2 hours ago [ - ]

I wouldn't draw such conclusions from one preprint paper. Especially since they measured only success rate, while quite often AGENTS.md exists to improve code quality, which wasn't measured. And even then, the paper concluded that human written AGENTS.md raised success rates.

slopinthebag 14 minutes ago [ - ]

> do nothing because can't be arsed

> somehow is the optimal strategy

My strategy of not spending an ounce of effort learning how to use AI beyond installing the Codex desktop app and telling it what to do keeps paying off lol.

madeofpalk 2 hours ago [ - ]

how can i get claude to always make sure it prettier-s and lints changes before pushing up the pr though?

mckirk 2 hours ago [ - ]

I think what that research found is that _auto-generated_ agent instructions made results slightly worse, but human-written ones made them slightly better, presumably because anything the model could auto-generate, it could also find out in-context.

But especially for conventions that would be difficult to pick up on in-context, these instruction files absolutely make sense. (Though it might be worth it to split them into multiple sub-files the model only reads when it needs that specific workflow.)

JofArnold 2 hours ago [ - ]

Run prettier etc in a hook.

emsimot 2 hours ago [ - ]

Git hooks

sampton 3 hours ago [ - ]

That's been my experience as well switching from Opus to Codex. Reasoning takes longer but answers are precise. Claude is sloppy in comparison.

solenoid0937 2 hours ago [ - ]

Weird, I have had the opposite experience. Codex is good at doing precisely what I tell it to do, Opus suggests well thought out plans even if it needs to push back to do it.

slopinthebag 3 minutes ago [ - ]

This is just the stochastic nature of LLM's at play. I think all of the SOTA models are roughly equivalent, but without enough samples people end up reading into it too much.

throwaway911282 3 hours ago [ - ]

codex has been really good so far and the fast mode is cherry on top! and the very generous limits is another cherry on top

slopinthebag 2 minutes ago [ - ]

It's well worth the $20 to not deal with any limits and have it handle all the boilerplate repetitive BS us programmers seem forced to deal with. I think 80% of the benefit comes from spending that $20 (20%? :P) and just having it do the lame shit that we probably shouldn't have to do but somehow need to.

irishcoffee 4 hours ago [ - ]

> It might be my AGENTS.md requiring clearer, simpler language

If you gave the exact same markdown file to me and I posted ed the exact same prompts as you, would I get the same results?

creamyhorror 3 hours ago [ - ]

I'm not sure if the model (under its temperature/other settings) produces deterministic responses. But I do think models' style and phrasing are fairly changeable via AGENTS.md-style guidelines.

5.4's choice of terms and phrasing is very precise and unambiguous to me, whereas 5.3-Codex often uses jargon and less precise phrases that I have to ask further about or demand fuller explanations for via AGENTS.md.

irishcoffee 3 hours ago [ - ]

So sharing markdown files is functionally useless, or no?

m3kw9 3 hours ago [ - ]

you probably can't and asking agents.md to "make it clearer" will likely give you the illusion of clearer language without actual well structured tests. agents.md is to usually change what the llm should focus on doing more that suits you. Not to say stuff like "be better", "make no mistakes"