> I'm curious and not an expert here, do you know why the TTFT is so much worse on Mac?
because the GPUs aren't as fantastic as everyone assumes?
> might also be less optimised in MLX?
prefill has gotta be one of the most optimized paths in MLX...
> I'm curious and not an expert here, do you know why the TTFT is so much worse on Mac?
because the GPUs aren't as fantastic as everyone assumes?
> might also be less optimised in MLX?
prefill has gotta be one of the most optimized paths in MLX...
No you don't understand, on Apple Silicon my CPU has comparable memory bandwidth to a $400 Pascal-era GPU. With the unified memory architecture, that means my iGPU gets 2016-levels of DDR transfer speed with none of the upsides of CUDA. It's the most cutting-edge hardware ever put in a personal computer, without a doubt.
Please show me on the 2016-era $400 Pascal GPU where you can install the 256 GB of VRAM.