Hacker News

Recently I'm working on a project that let the users to analyze and rate the bid documents based on one tender document. The general logic is quite alike to the cv-job_offer example in Pipelex. The challenge I encountered is that both tender and bid documents are very large, it's impossible to load even a single one in LLM context, not to say both. Therefore I have to design a workflow to extract structured informations and evaluation criteria from the tender doc into predefined data models, and then dispatch multiple tasks to complete evaluation of these data models against each bid doc.

I'm wondering if this kind of scenario (essentially it's just the input document is too big) is possible to be handled in Pipelex. In my understanding DSL is good for it's high-level abstraction and easy to understand, but lacks flexibility and the power is restricted. How can the users of Pipelex iterate the pipelines to fulfill the complex need when the business logic became complex inevitably?

lchoquel 5 months ago [ - ]

When the business logic becomes complex, our motto which is to break it down into smaller problems, until they are manageable.

Regarding large docs, I know what you mean and we've been there: at one point we considered focusing on that feature, build a kind of agentic system to master large docs. At the time in 2023-2024, everyone was relying on vector store RAG and embeddings to solve that problem but we always thought that solution wouldn't bring enough reliability. We wanted the LLM to actually read the whole docs, in order to know what's in there and where. The idea was to read and "take notes" according to different aspects of what it's reading, and synthesize hierarchically from bottom to top. So, that part of the work could be done once and the structured "notes" could be exploited for each use case, pretty efficiently thanks to the zoom-in/out provided by the architecture. We went pretty far in building that solution, which we called "Pyramid RAG" internally, and the aim was to build AI workflows of top of that. But at some point, the building of the pyramid became pretty complex, and we realized we needed AI workflows to build the pyramid itself, and we needed to to do it right. That's when we decided to focus on what would become Pipelex. Now, in our new setting, solving the "large docs" problem is a large problem we can break down into a collection of Pipelex workflows. So it's part of our plan to provide those building blocks, and we hope to involve the community as much as possible, to cover all the possible applications.

megunderstood 5 months ago [ - ]

> Now, in our new setting, solving the "large docs" problem is a large problem we can break down into a collection of Pipelex workflows.

I think a solution to this problem as a runnable example would be a nice showcase of what is achievable with pipelex.

lchoquel 5 months ago [ - ]

Working on it.

novoreorx 5 months ago [ - ]

Very thoughtful and inspiring suggestion, I think we are on the same road, which really makes it much clear to me. Looking forward seeing the Pipelex solution for these kind of problems.