So just gonna ask a question, probably will get downvoted
I know this is flash, but….
But other than this guy, did our whole society seriously never flamegraph this stuff before we started requesting nuclear reactors colocated at data centers and like more than 10% of gdp?
Someone needs to answer because this isn’t even a m4 or m5… WHAT THE FUCK
Sidenote: shout out antirez love my redis :)
This is built atop a tower of stuff people built with profiling and performance-oriented design.
That said, I've found that most corporate environments are unintentionally hostile to this kind of optimization work. It's hard to justify until the work is already done. That means you often need people with the skills, means, and motivation to do this that are outside normal corporate constraints. There aren't many of those.
Building this into agentic dev workflows (subject to token/time constraints) is something I spent a lot of time doing at work. I actually am kind of proud of that hahah
But you’re right I agree
In the corporate world they sadly don’t take kindly to performance profiling as a first class citizen
Granted I will say optimization without requirements may not be beneficial but at least profiling itself seems worthy if you have use cases.
A lot of us have been working in the network packet pusher software , distributed systems , distributed storage space
I’m happy to see more stuff like this :)
TLDR; I’ve not seen a lot of flamegraphs of Llm end to end … idk if anyone else has?
DSv4 generates much faster on NVIDIA class hardware. It is just a very efficient model.
Every lab has a bunch of people doing nothing but optimizing.
8011943553
The world is not China.