The previous version of this model has been pretty bad, but claimed to adhere to copyright laws. However, based on my testing, that's not true either. So in my view this is completely useless.

So far the smallest model I have actually seen behave in a way that feels consistent with the contemporary LLM chat experience is Gemma 4 12B. (The QAT build particularly). The E4B model is not bad — it has a good conversational flow, it responds well if nudged — but the 12B model feels capable.

Nothing below that really seems to be good for anything other than training for specific tasks. I have not been impressed by the earlier Apertus 8B model, which doesn't feel like it really responds to nudges.

I am a strong believer in smaller models, so I might try one of these out of curiosity to see if it might do useful things in limited contexts.

As long as the following remains true, this release ends up a bigger contribution to science at large than most other models trained "behind closed doors":

> Fully open model: open weights + open data + full training details including all data and training recipes

Is a recipe useful if no one likes it?

There are equally open, much more useful models out there: https://artificialanalysis.ai/?models=nvidia-nemotron-3-ultr...

Nemotron still has partial closed data. Having multiple models to chose from is a good thing

It uses fineweb, which is derived from Common Crawl, which is an unlicensed scrape of web pages.

You don't need a license to scrape the public web and analyze it, turn it into tokens and other transformations. Let's not expand copyright beyond the horrible monster it already is.

I think it's likely that US law will continue to find training on scraped, unlicensed data to be legal.

That doesn't mean much to the many people I know of who refuse to use a technology that they see as being unethically created using the work of others without compensating them.

I continue to hope that someone will train a "vegan" model on licensed or out-of-copyright data so those people can experience the benefits of this class of technology.

(I compare them to vegans because, like vegans, I think their ethical position is credible and has merit even though I do not choose the same ethical framework for myself.)

This is as ethical as it gets. They're getting compensated by being able to use the result of their work freely. This is the rising tide that lifts all boats.

Good luck convincing the training data licensing holdouts of that.

I'm curious how you test; could you explain? Do you have a set of factoids that should be subject to copyright, but are somehow literally (whole work) generated by the model in question?