Hacker News

This is exactly what I've been looking for. I run a Mac mini as an always-on server for a side project and the API costs for cloud models are adding up fast. Being able to run something locally even at slower speeds would be a game changer for background tasks. What kind of tokens/sec are you seeing on a base M2 Mac mini with 16GB?