If you aren't already aware, Karpathy has several videos that could get you there in a few hours https://www.youtube.com/@AndrejKarpathy
If you aren't already aware, Karpathy has several videos that could get you there in a few hours https://www.youtube.com/@AndrejKarpathy
very thanks!
Also check out his nanochat repo. I used the repo, claude and shadeform to train my own mini model for about $300. Would have been less but I screwed up and let the cloud gpu rental run for a few hours even though the training run errored out.
Of course the model was dumber than GPT2 but still it was a great learning experience.