Recently, I started a personal project to build an LLM from zero.
I've spent a ton of time reading up on math, ML, and DL through books, open courses, and papers, while also studying all the major open-source LLM architectures.
Since I only have one DGX Spark machine to run experiments, I can't train a massive LLM from the get-go. Instead, I'm experimenting with an auto-scaling parameter mechanism, which has led me to create a pretty unconventional and fun architecture!
Why go through all this effort when modern LLMs can basically write simple LLMs themselves, and I clearly can't out-compute the big tech giants?
Honestly, it's because I'm obsessed with the core mechanics of LLMs. I want to build something exclusively for myself and hopefully discover some completely undiscovered mechanisms along the way.
Just keeping a record and sharing my progress—having fun with it is truly the biggest reward!
I'll share it when I get a chance!
Do share! I read all the blog posts where people share their experiences of building small scale LLMs "from scratch".
Most hobbyists rent the compute for training models instead of needing to purchase it all out right.