Hacker News

There's an absolutely massive cultural and behavioural bias in those models. Models will suggest things like "go to the hospital" for things that require GP appointments, "just drive three hours" while it's faster to go places by train, and so on. They will do it in anglicised Dutch (compound words split, English-like grammar structures) that's perfectly understandable, but the cultural bias is there if you know to look for it.

Furthermore, the expertise in designing and training these models is valuable as well. The existing models are good as a starting point in terms of learning from previous mistakes, but we should not just let a handful of American and Chinese people keep the knowledge and expertise.

One problem with this particular project, though, is that copyright has been enforced for Dutch LLM training before, and the AI industry cannot exist without massive scale piracy, the likes of which has never been seen before. A lot of Dutch training material exists in pirated books that AI companies in countries that do not care about copyright have access to, but are exempted from the training set here. The impact of enforcing copyright on an AI model will be quite interesting to see.