I can't even get Claude or GPT-5 to consistently produce good flows for common use cases, much less domain-specific shit. They have deep vocabulary though, which makes them sound better informed than they are.
They are very good at writing code and debugging visible errors- but that's like 50% the harness.