I am obviously paraphrasing, but the general idea is that trying to synthesize style from a codebase into e.g. a markdown guide generally doesn't work very well. What achieves style transfer is providing the model with a lot of examples of the style, conventions, patterns you want.

To put it in practice: if you point claude/codex to a repository and you ask it to implement feature X using style guide Y, the code will probably work, but you can usually get better results by saying "do it in the style of this file, it was done well there".

Right more simply put it's great at being a copy cat, exploring similar data points that match your token needs.

It is not great at decision making or judgment calls that don't have a well defined spec or plan in place yet; like unofficial or unapproved tokens if you will. A lot of this stuff simply never has had specs as it has been internal to how companies work and their secret sauce.

The closest thing we have are governance and compliance policies due to legal/business needs requiring it so it's far more well documented than operational ones in how we work. It is more about the how versus the what here I guess is what I'm saying.

But yeah this is why it does great when there are tests, design systems, evals, and other artifacts to mirror. Far more reckless and unpredictable without these things, but still great for exploration and finding the data output you seek.

Doesn't that make sense? Its text prediction. If you give it examples, it can predict. Synthesizing "put semi-colons on new lines" requires it to generate its own examples 'in its head' (so to speak) and remember that. It won't.

It's like when I see people feeding it a whole bunch of "best practices" and expect it to follow them. It won't. But you could ask it questions about the best practices all day long.

Yes, exactly. Any engineer deep on this stuff right now understands that grounded predictive engine sprinkled with RL training and are discovering what that means in terms of its strengths and weaknesses for company use.

I ran into similar issues as we started to roll out LLM generated financials in our org.. I’m so used to the old SQL workflow of “grab this data from this table, that data from that table, combine it into a final result that looks like xxxx” where the tables were outputs from reports in our ERP but I was having terrible results.

Ended up pointing Claude at a few sample files from our existing reporting, gave it read-only oauth access to the ERP and said “build a new report showing the cash by project as calculated by xxxx - yyyy + zzzz in the style of the existing reports” and it basically one-shot from there.

Kind of crazy and I built a bunch of redundant check-sums because I honestly didn’t think it would be able to replace like 6 workdays of effort for the 2 FTEs who generate that kind of thing manually every month but so far so good..

[flagged]

[flagged]