Interesting to see a pivot away from MoE by both IBM and mistral while the larger classes of SOTA of models all seem to be sticking to it.

Quick vibe check of it- 8B @ Q6 - seems promising. Bit of a clinical tone, but can see that being useful for data processing and similar. You don't really want a LLM that spams you with emojis sometimes...

Makes sense, dense for small models, dense or MoE for larger ones, end up fitting various hardware setups pretty neatly, no need for MoE at smaller scale and dense too heavy at large scale.

I never want LLM to span me with emojis. What is the use case for that? I find it highly annoying.

Think it can be a plus in moderation. eg in openclaw it can add some character

But yea dislike that style where each heading and bullet point gets an emoji

Shh people are paying for each token. Don't get them asking too many questions