Hacker News

I'm not sure, but I suspect that LLM weights don't compress all that well. The intuition here is that training an LLM is compression of the training data into the weights, so they are probably very information dense already. Can't squeeze them down much.