Thats super helpful, thanks for the details. Makes sense now that PSNT is more of a transport/runtime format for the PS2 constraints than a quality hack.

Very cool that it supports bitnet too even if results are rough right now, feels like theres a lot of room to tune there over time. when you do fix tok/sec, are you planning to post per-stage timings too (tokenizer, weight stream, matmul, samppling)? would be awesome to see where the biggest bottleneck is on real hw