They do. I'm currently seeing a degradation on Opus 4.6 on tasks it could do without trouble a few months back. Obvious I'm a sample of n=1, but I'm also convinced a new model is around the corner and they preemptively nerf their current model so people notice the "improvement".
Make that 2, I told my friends yesterday "Opus got dumb, new model must be coming".
I swear that difference sessions will route to different quants. Sometimes it's good, sometimes not.