'fast'
I'm sure it can do 2+2= fast
After that? No way.
There is a reason NVIDIA is #1 and my fortune 20 company did not buy a macbook for our local AI.
What inspires people to post this? Astroturfing? Fanboyism? Post Purchase remorse?
'fast'
I'm sure it can do 2+2= fast
After that? No way.
There is a reason NVIDIA is #1 and my fortune 20 company did not buy a macbook for our local AI.
What inspires people to post this? Astroturfing? Fanboyism? Post Purchase remorse?
I have a Mac Studio m3 ultra on my desk, and a user account on a HPC full of NVIDIA GH200. I use both and the Mac has its purpose.
It can notably run some of the best open weight models with little power and without triggering its fan.
It can run and the token generation is fast enough, but the prompt processing is so slow that it makes them next to useless. That is the case with my M3 Pro at least, compared to the RTX I have on my Windows machine.
This is why I'm personally waiting for M5/M6 to finally have some decent prompt processing performance, it makes a huge difference in all the agentic tools.
Just add a DGX Spark for token prefill and stream it to M3 using Exo. M5 Ultra should have about the same compute as DGX Spark for FP4 and you don't have to wait until Apple releases it. Also, a 128GB "appliance" like that is now "super cheap" given the RAM prices and this won't last long.
>with little power and without triggering its fan.
This is how I know something is fishy.
No one cares about this. This became a new benchmark when Apple couldn't compete anywhere else.
I understand if you already made the mistake of buying something that doesn't perform as well as you were expecting, you are going to look for ways to justify the purchase. "It runs with little power" is on 0 people's christmas list.
It was for my team. Running useful LLMs on battery power is neat for example. Some simply care a bit about sustainability.
It’s also good value if you want a lot of memory.
What would you advice for people with a similar budget? It’s a real question.
But you arent really running LLMs. You just say you are.
There is novelty, but not practical use case.
My $700, 2023, 3060 laptop runs 8B models. At the enterprise level we got 2, A6000s.
Both are useful and were used for economic gain. I don't think you have gotten any gain.
Yes a good phone can run a quantised 8B too.
Two A6000 is fast but quite limited in memory. It depends on the use case.
>Yes a good phone can run a quantised 8B too.
Mac expectations in a nutshell lmao
I already knew this because we tried doing it at an enterprise level, but it makes me well aware nothing has changed in the last year.
We are not talking about the same things. You are talking about "Teknickaly possible". I'm talking about useful.
If you are happy with 96GB of memory, nice for you.
I use my local AI, so: yes very much.
Fancy RAM doesn't mean much when you are just using it for facebook. Oh I guess you can pretend to use Local LLMs on HN too.