Report: https://github.com/swiss-ai/apertus-tech-report/raw/refs/hea...
Key features
Fully open model: open weights + open data + full training details including all data and training recipes
Massively Multilingual: 1811 natively supported languages
Compliant: Apertus is trained while respecting opt-out consent of data owners (even retrospectivey), and avoiding memorization of training data
Their struggle with Nvidia driver bugs they had to work around was very relatable. You'd think if someone buys 10,752 of their high-end GPUs you'd get some support with it.
Agreed, but the problem seems to be even worse with AMD from what I hear, or at least it was when I checked with some of my HPC buddies a little over a year ago. Constant driver bugs and crickets from upstream "support".
no, you have to pay the yearly per gpu license for that.
did I miss a blog on this?
we didn't have time to write one yet, but there is the tech report which has a lot of details already
Report is packed with interesting details. Engineering challenges and solutions chapter especially show how things which are supposed and expected to work break when put through a massive scale. Really difficult bugs. Great writeup.
thank you!
Looks like the performance is pretty decent, somewhere around Llama3.1 for general knowledge (Tables 17) but still a bit behind in Code and Reasoning (Table 18). Llama3.1 was released about one year ago.
There's an interesting "Swiss AI Charter" on pg. 107.