Hacker News

new | ask | show | jobs

rahimnathwani an hour ago [ - ]

Looking forward to next time, hoping you mention speculative decoding and MTP :)

It would support your point about the performance of 20GB local models.