Looking forward to next time, hoping you mention speculative decoding and MTP :)
It would support your point about the performance of 20GB local models.
Looking forward to next time, hoping you mention speculative decoding and MTP :)
It would support your point about the performance of 20GB local models.