short answer: they mostly aren't

A few people are running highly quantized models with limited context windows. It's still impressive, but not the benchmark level intelligence. Very few people could afford a rig for reasonable local performance at a reasonable quant, at full context size.

The antirez example is 2.6bit quant, 32k context, and few tokens per second... on a ~$7000 MacBook M5 (new RAM pricing).