I have a naive question here - first, the token speed is very impressive. but why this is the highlight? I would prefer the actual performance.
I have a naive question here - first, the token speed is very impressive. but why this is the highlight? I would prefer the actual performance.
Token generation speed matters for sequential agentic workflows, like software engineering / vibe coding, where a lot of reasoning tokens, code generation, refactoring, testing, etc. happen in a loop before an actual outcome is served to the user.
About model performance, we plan to support the latest frontier models (this tech preview is about the speed of the engine)
[flagged]