Congratulations! I’ll be interested to see the next steps in alignment. Do you plan to start selling access, or collect more data to train bigger & better? What tasks or benchmarks are your biggest guide stars, or what was unexpectedly tricky—a few are hinted in the post.

It would be pretty interesting to see activation maps for the encoder on video, confidence building to see the compression derived from so much training.

we have an alignment blog post dropping soon! scaling up in the next couple of months, then hopefully opening up an API or licensing it.

Benchmarks are really fun—lots of secret ones. Our main thesis is that you should be using the same benchmarks to measure human ability to use a computer, as you would an AI model. Definitely a suite of continuous long term planning tasks (games) and things such as marking emails as spam etc.

definitely! we are looking into more interp + visualizations in general as we scale up.