Hacker News

fragmede 2 months ago [ - ]

I believe you when you say you're not changing the model file loaded onto the H100s or whatever, but there's something going on, beyond just being slower, when the GPUs are heavily loaded.

clbrmbr 2 months ago [ - ]

I do wonder about reasoning effort.

hauntsaninja 2 months ago [ - ]

Reasoning effort is denominated in tokens, not time, so no difference beyond slowness at heavy load

(I work at OpenAI)