Anecdotal, but here's my experience.
For personal stuff I use forgecode with openrouter. Firstly, forgecode is a much better harness than Cloude code (IMHO).
Anyway, regarding the models, my experience is that there is not much difference in terms of quality, but the cost difference is insane. At least for how I use agents. Yesterday's example is the following: I am developing a small DSL for search across complex technical documents. I wanted to add a small operator to it and thought that to give fable a spin. It burned through 13 USD and while it delivered the solution it wasn't objectively better than what Deepseek v4 did for 1.7 dollars (same exact task because I was curious).
For full disclosure, I ask agents for piecemeal stuff. Like in the DSL case, I designed the operators and then asked agents to implement them one by one. Probably if I asked to design the whole thing starting from these complex documents Fable would shine, but every time I try to give agents broader scope tasks they burn through millions of tokens, generate questionable code, which I have to spend time familiarize myself with.
I'm making DSLs a lot as an architecture pattern also. I'd be curious to know what stack you're using this and how you're approaching it
I am getting familiar with Rust and so I have been playing around with Quoth (https://github.com/sam0x17/quoth) for now.
It is very basic and I am no DSL expert, but my idea was to build a graph from those complex documents (maintenance manuals) a that to decide what tools can be used for a given part on a given equipment in a given situation. If there is a path from A to Z it means you can use that tool given the circumstances. Basically the DSL is about pruning the graph as you specify things. I could have very well done without, but it is a fun project to try out rust, so I said, why not :)