This seems to be just fovused on changing the tools and workflows it uses, nothing foundational

> nothing foundational

I don't think scaling this to also run training runs with the models is something that small labs / phd students can do. They lack the compute for that by orders of magnitude. Trying it with toy models might not work, trying it with reasonably large models is out of their budget. The only ones who can realistically do this are large labs (goog, oai, meta, etc.)