> The multipart streaming workload is inherently expensive.

Streaming workload is not inherently expensive. The main work is to bring the bytes of the files to the network card as quick as possible, and nearly no computation needs to be performed.

> The cost of generating boundaries and constructing headers scales with request count and payload size.

The only computation necessary to generate boundaries is to ensure that the chosen boundary does not occur in the content, and it seems that the code does not actually check this, but generates a random UUID4. Boundaries and headers are per-file and could be cached, so they don't scale with the number of requests or payload size.

You're right about boundary caching opportunities. The computational cost I'm referring to comes from the construction of the per-request multipart header, and file metadata operations, rather than just boundary generation.

The "inherently expensive" claim was overstated - it's expensive relative to static file serving, and unavoidable, but you're correct that there are optimization opportunities in the current implementation. I've identified three opportunities to improve the design: boundary generation, content type assignment, and header construction.

One clarification: the dynamic bundling via query parameters limits traditional caching strategies. Since each request can specify different file combinations, the multipart response structure must be generated per-request rather than cached.

Axon also represents a core framework. How you implement caching depends on your specific use case - this becomes business logic rather than request processing logic. Its minimal tooling is intended to be a feature, though, as you have pointed out, it can also be limiting.

You sound very much like a LLM