Next try actually teaching it to see by training a projector with a vision encoder on gpt-oss.

About to release GPT-OSS-120B-Vision and GPT-OSS-20B-Vision, how about that! :D

too slow bro

might be slower, but then it can get the actual image as input, not just some description of it