Hacker News

I understand the methods to address the fine-tuning and RAG issues but lack the time and possibly the technical skills to implement the solution. Fine-tuning can potentially dumb down a perfect model, and RAG has context limitations and may not cover all content. My thinking, we should vectorize the text and embed these vectors into all layers of the model at inference time. This approach would bypass the context size limitations and resource wastage associated with fine-tuning, as vectorization is fast. I believe this vectorization and embedding strategy is the solution.