This is good for applications where a background queue based RAG is acceptable. You upload a file, set the expectation to the user that you're processing it and needs more time for a few hours and then after X hours you deliver them. Great for manuals, documentation and larger content.

But for on-demand, near instant RAG (like say in a chat application), this won't work. Speed vs accuracy vs cost. Cost will be a really big one.

If you have a lot of time, cost on a local machine may be low.

[dead]