The workflow orchestration tools I've used over the past have been consistently unsatisfying, so I've been building my own. With Coflux (https://coflux.com), workflows are defined declaratively in plain Python with decorated functions (tasks). Workers connect to the orchestration server and get assigned work - calls to other tasks get intercepted and then re-scheduled by the server onto (potentially different) workers. Tasks are executed in pre-warmed, isolated processes, with low latency.
Beyond standard features (retries, caching, timeouts - enabled with attributes on the decorator), Coflux supports more novel features - like suspense (where a task can choose to go to sleep and get restarted when a result it depends on becomes available), memoisation (where steps within a run are aggressively cached so that you can re-run steps in a workflow without re-running upstream steps), and the ability to re-run a step in a different workspaces (with updated code, or in a different environment).
It turns out this works great for implementing agentic systems - you can provide references to tasks as tools to an LLM call and have the AI drive - tasks can be easily sandboxed. And Claude is very capable of using the CLI to interact with the orchestration server to submit workflows, investigate failed runs, make updates to workflows and re-run steps.
I'm trying to make sure it's easy to try out - there's a self-contained CLI that can be used to start the server (a single Docker container), run worker processes, and then interact with the server. The dev mode automatically restarts the workers as you make local changes. There's also a hosted UI for observing runs in real-time, where you can see the execution graph, access logs/metrics/assets/etc - it works without creating an account - the browser interacts with your orchestration server directly.
Sounds and looks very similar to prefect. What does Coflux do different than prefect?