Hacker News

Recently, I started a personal project to build an LLM from zero.

I've spent a ton of time reading up on math, ML, and DL through books, open courses, and papers, while also studying all the major open-source LLM architectures.

Since I only have one DGX Spark machine to run experiments, I can't train a massive LLM from the get-go. Instead, I'm experimenting with an auto-scaling parameter mechanism, which has led me to create a pretty unconventional and fun architecture!

Why go through all this effort when modern LLMs can basically write simple LLMs themselves, and I clearly can't out-compute the big tech giants?

Honestly, it's because I'm obsessed with the core mechanics of LLMs. I want to build something exclusively for myself and hopefully discover some completely undiscovered mechanisms along the way.

Just keeping a record and sharing my progress—having fun with it is truly the biggest reward!

I'll share it when I get a chance!