The framework used in the book, malt[0], is currently not GPU-accelerated, but it's being worked on.
Maybe interesting, I used it for a toy implementation of the GPT architecture[1] in about 500 lines.
(I studied with one of the authors, Dr. Daniel Friedman; wasn't super involved here but proofread a late draft and TA'd for a course based off the book.)