I have been using it for less than an hour so take this with a grain of salt of being excited for the new tech.

In a project like mine (https://github.com/tsz-org/tsz) I am constantly frustrated that models were not doing enough research and were not taking into account other situations. Again and again models would produce code that would fix one thing and break 2 other tests that were "unrelated".

With Fable it seems like tasks are taking much longer (I have not seen a pull request from Fable sessions yet) but reading the transcription of those sessions I can see how it is doing the right thing by not leaving any stone unturned.

As the article says, it's hard to communicate this "feeling" about models because it is very project specific but I thought I share

Does this not indicate that the project might not be structured in an appropriate way that allows incrementally adding features?

In general, sooner or later you need to restructure one thing or another when requirements are changing. Good code lets you reason about a refactoring, and experience tells you when it is necessary or appropriate. Coding agents aren’t very good at the latter.

the setup is solid. there are thousands of tests and CI won't let things to merge if tests are failing.

But overall, this is pretty normal for compilers to have this sort of "unexpected" tests failing due to some work in an area. It happened to me when I was coding everything manually back in the day too

> there are thousands of tests and CI won't let things to merge if tests are failing.

That's not what a clean setup means... I mean good separation of concerns, established invariants, etc.

A compiler and type checker is very special case where you can fix something in the lexer or parser and break another thing in AST walker etc. tsz is well architected but those things can happen if you're not careful and that's precisely what I meant in my original comment. Fable can think how changing parser can impact checker etc...

[dead]