Their voice capable model is several generations behind the state of the art text-only one, as far as I know.
I don’t think it even has reasoning tokens, so it’s no surprise that it’s as most as smart as the “instant” models (i.e., not very).
Their voice capable model is several generations behind the state of the art text-only one, as far as I know.
I don’t think it even has reasoning tokens, so it’s no surprise that it’s as most as smart as the “instant” models (i.e., not very).