30tok/s looks fine when you're just streaming code, but the issue is that there's a lot of background noise like tool-calling conventions, metadata, "thinking", etc.
30tok/s looks fine when you're just streaming code, but the issue is that there's a lot of background noise like tool-calling conventions, metadata, "thinking", etc.