I rewrote a simple RAG ingestion pipeline from Python to Go.

It reads from a database. Generates embeddings. Writes it to a vector database.

  - ~10X faster
  - ~10X lower memory usage
The only problem is that you have to spend a lot of time figuring out how to do it.

All instructions on the Internet and even on the vector database documentation are in Python.

If speed and memory use aren't a bottleneck then "a lot of time figuring out how to do it" is probably the biggest cost for the company. Generally these things can be run offline and memory is fairly cheap. You can get a month of a machine with a ton of RAM for the equivalent of one hour of developer time of someone who knows how to do this. That's why Python is so popular.

>I rewrote a simple RAG ingestion pipeline from Python to Go

I also wrote a RAG pipeline in Go, using OpenSearch for hybrid search (full-text + semantic) and the OpenAI API. I reused OpenSearch because our product was already using it for other purposes, and it supports vector search.

For me, the hardest part was figuring out all the additional settings and knobs in OpenSearch to achieve around 90% successful retrieval, as well as determining the right prompt and various settings for the LLM. I've found that these settings can be very sensitive to the type of data you're applying RAG to. I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too

> I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too

There are Python libraries that will simplify the task by giving a better structure to your problem. The knobs will be fewer and more high-level.