They must have some sort of smoke tests for common operations, run in a test harness with the system prompts they force on users, right?

....Right?

What kind of Mickey mouse operation are they running over there?

In the original claude degradation followup email Boris mentioned they are upping the percentage of engineers required to use the public version of claude code. I have no idea what percentage this is, or how much of a punishment it is considered to be. :)

That said, I was sympathetic to the recent bug reports —- to trigger one, you’d need to have a session that waited an hour doing nothing and then very specifically tested for in-context retrieval. I don’t want to run that test, do you want to run that test?

I wouldn't bet a chocolate chip cookie on that.