They are only non-deterministic when you’re doing batching and a kernel ends up running across a “random” set of token streams. If you’re only processing one user’s request, they’re very much deterministic.
They are only non-deterministic when you’re doing batching and a kernel ends up running across a “random” set of token streams. If you’re only processing one user’s request, they’re very much deterministic.