At least not "open source"
> "open weights + open data + full training details including all data and training recipes"
Is it reproducible?
> respecting opt-out consent of data owners (even retrospectivey)
Were they notified and given an option to opt out? Owners and authors are not the same. Data owners aren't copyright owners either.
> avoiding memorization of training data
Not convincing.
I saw some of the pretraining code in github, but not the post-training.
posttraining codebase is here: https://github.com/swiss-ai/posttraining