Numbers are from https://www.fitmyllm.com/ so they're not a real hardware benchmark just what you're expected to get. YMMV.

Ah, ok. I took a look at the 3090 numbers and they list 400 tok/s prefill, so if I normalize my expectations to that base line the numbers you posted do make sense. I haven't dug deep into that site's methodology, but their estimates seems way off. Especially since they don't take into account cache quant when deciding whether or not you can run a model. Overall I found that website a bit confusing, but maybe the UX just didn't click with me.