Couple days ago, i used Claude to implement an improved version of gpt-1. I am no ML Engineer by no means. I am just a normal backend engineer. I ended up creating a hybrid between gpt-1 and modded-nanogpt (from KellerJordan).
I was able to reproduce the results of the original gpt-1 paper with my gaming PC. I don't even have alot of VRAM. My NVIDIA GeForce RTX 2060 SUPER was able to reproduce most of the results with just 1 hour of training. I would totally recommend to do the same, if you are interested in pre-training LLMs.
The code is here: https://github.com/epoyraz/modded-gpt-1 But, you can also just ask Claude 4.8 or Codex 5.5