This is a pretty phenomenal article.
Even for those who don’t care about LLM use, this is just a great article on optimizing Swift performance, which is sadly something that doesn’t have a lot of written material for.
I’m curious if the AMX instructions are truly secret. In theory you could use an M4 or above and get them via SME I think but I’m just guessing as I’ve never tried intrinsic from Swift myself.
> get them via SME
I have no idea what this means - AMX was replaced by SME on M4. It's a new unit not just an "abstract intrinsic" (which would make zero sense).
I’m not sure what part is confusing you or how to word it another way to make more sense to you.
What I’m saying is that instead of using the secret AMX instructions, just use SME , assuming they have the hardware available to them.
AMX isn’t truly gone afaik , at least according to the folks who have been looking at it. It’s just deprecated and it seems like the architecture treats them somewhat like aliases, preventing concurrent use within a process.