Declarative workflows is such a good idea, fantastic, and I love the AI first principles where pipeline creation and editing the pipeline can be done with AI too.
The declarative style keeps the workflow detail at a high enough level to iterate super quick - love that. More important to me is that it’s structured and seems like it would be more testable (I see validation in your docs).
Zooming in to the pipe/agent steps I can’t quite see if you can leverage MCP as a client and make tool calls? Can you confirm? If not what’s your solution for working with APIs in the middle of your pipeline?
Also a quick question, declarative workflows won’t solve the fact that LLMs output is always non deterministic, and so we can’t always be guaranteed the output from prior steps will be correct. What tools or techniques are you using/recommending to measure the reliability of the output from prior steps? I’m thinking of how might you measure at a step level to help you prioritise which prompts need refinements or optimisations? Is this a problem you expect to own in Pipex or one to be solved elsewhere?
Great job guys, your approach looks like the right way to solve this problem and add some reliability to this space. Thanks for sharing!
Hi Clafferty, Providing an MCP server was a no brainer and we have the first version available. But you're right, using "MCP as a client" is a question we have started asking ourselves too. But we haven't had the time to experiment yet, so, no definitive answer. For now, we have a type of pipe called PipeFunc which can call a python function and so possibly any kind of tool under the hood. But that is really a makeshift solution. And we are eager to get your point of view and discuss with the community to get it right.
Many companies are working on evals and we will have a strategy for integration with Pipelex. What we have already is modularity: you can test each pipe separately or test a whole workflow, that is pretty convenient. Better yet, we have the "conceptual" level of abstraction: the code is the documentation. So you don't need any additional work to explain to an eval system what we were expecting at each workflow step: it's already written into it. We even plan to have an option (typically for debug mode) that checks every input and every output complies semantically with what was intended and expected.
Thanks a lot for your feedback! It's a lot of work so greatly appreciated.
+1 for MCP client support. I’ve backed off MCP after an initial burst of enthusiasm specifically because the client ecosystem is limited.