People often bring up the incredible efficiency improvements of LLMs over the last few years, but I don't think people do a really good job of putting it into perspective just how much more efficient they have gotten. I have a machine in my home with a single RX 7900 XTX in it. On that machine, I am able to run language models that blow GPT-3.5 Turbo out of the water in terms of quality, knowledge, and even speed! That is crazy to think about when you consider how large and capable that model was.

I can often get away with just using models locally in contexts that I care about privacy. Sometimes I will use more capable models through APIs to generate richer prompts than I could write myself to be able to better guide local models too.