The other comments are correct, but let me try for a different phrasing, because it's a complex topic. You have two parts for attestation: The hardware provides the keys and computation for the measurement state that you can't change as a user. The software provides the extra information/measurements to the hardware.
That means you can't simulate the hardware in a way that would allow you to cheat (the keys/method won't match). And you can't replace the software part (the measurements won't match).
It all depends on the third party and the hardware keys not leaking, but at long as you can review the software part, you can be sure the validation of the value sent with the response is enough.
I understand hardware attestation at this level, it's why you couldn't route a hardware attestation from a different machine, that's not the one the user cares about, that I'm working on understanding.
Because to obtain the result of attestation, you'd need to actually run the prompt on the verified machine in the first place. (And in practice the signature would be bound to your response as well)
The attestation report is produced after the user sends a prompt to the LLM? I thought it was the proof the correct model weights are loaded on some machine.
The attestation report is produced ahead of time and verified on each connection (before the prompt is sent). Every time the client connects to do an inference request via one of the Tinfoil SDKs, the attestation report is checked relative to a known-good/public configuration to ensure the connection is to a server that is running the right model.
The attestation is tied to the Modelwrap root hash (the root hash is included in the attestation report) so you know that the machine that is serving the model has the right model weights