It's unclear to me what the end result is. Did you build real hardware or is it simulated somehow? If it's hardware, what kind and how did you make it?

Verilog spec by the looks of it. So you should be able to make it work on an FPGA or if you happen to have a chip fab in your garage you might want to make your own silicon ;) I'd go the FPGA route.

Based on the code in the repo it looks like they designed the chip in verilog and then ran it in a simulator. But if they have the verilog code in principle they could send it off to a fab and get real hardware back.

Next step would be to try it out in an FPGA.

I feel like I missed a whole section somewhere. "Built a toy TPU". What does that mean? I have no idea what was actually "built" here.

By "toy TPU", we simulated forward pass + backprop on a minimal tpu-like accelerator.

all in simulation :)