I find it helpful to read a saxpy and GEMM kernel for a new accelerator like this - do they have an example?