Hacker News

godelski 4 days ago [ - ]

  > Engineers are stuck in the old paradigm of "perfect" algorithms.

Reminds me of a misinterpretation of Knuth.

  > Premature optimization is the root of all evil.

He was definitely knocking engineers for wanting to write "perfect" algorithms, but this quote also got bastardized to mean something different from what he said (happens to many clichés). All he said was "grab a fucking profiler before you optimize."

But now, I'm not sure a lot of programmers even know what a profiler is. When was the last time you saw someone profile their code?

Problem is we've taken the idea of "minimum viable product" too far. People are saying "Doesn't have to be perfect, just has to work." I think most people agree. But with the current state of things? I disagree that things even work. We're so far away from the question of optimization. It's bad enough that there are apps that require several gigs to just edit a 30kb document but FFS we're living in a world where Windows Hello crashes Microsoft Outlook. It's not the programs are ugly babies that could be better, they are monstrosities begging to be put to death.

I WISH we could talk about optimization. I WISH our problem was perfectionism. But right now our problem is that everything is a steaming pile of garbage and most people are just shrugging their arms like "it is the way it is". Just because you don't clean up that steaming pile of garbage doesn't mean someone else doesn't. So stop passing the buck.

mcv 4 days ago [ - ]

> When was the last time you saw someone profile their code?

A year ago. I heavily relied on one to optimize a complex data import that took an hour for a million line Excel file. The algorithm translated it to a graph according to a user-specified definition and would update an existing graph in neo4j, keeping the whole thing consistent.

The only other guy who understood the algorithm (a math PhD) thought it was as optimal as it could get. I used the profiler to find all the bottlenecks, which were all DB checks for the existence of nodes, and implemented custom indices to reduce import time from an hour to 3 minutes.

It did introduce a bunch of bugs that I had to fix, but I also discovered some bugs in the original algorithm.

It was one of my best programming experiences ever. Especially the payoff at the end when it went down from an hour to 3 minutes is a dopamine rush like never before. Now I want to optimize more code.

I don't think users cared, though; originally this work would take days by hand, so an hour was already pretty good. Now I made something fiendishly complex look trivial.

godelski 4 days ago [ - ]

  > from an hour to 3 minutes

I sure bet that the users cared. Yeah, starting from a few days an hour feels great but you also get accustomed to it.

  > It did introduce a bunch of bugs that I had to fix, but I also discovered some bugs in the original algorithm.

I find this is extremely common when I profile code. It is just so easy to miss bugs. People get lulled into a false sense of security because tests pass but test just aren't enough. But for some reason when I say "tests aren't enough" people hear "I don't write tests."

Seeing those big improvements and knowing you did more than make it faster is always really rewarding. I hope you do do more optimization :) Just remember Knuth's advice. Because IO is a common problem and Big O isn't going to tell you about that one haha

mcv 3 days ago [ - ]

Yeah, I first want to know there's an actual performance issue to fix. That's basically what Knuth said, and that's what I live by.

> People get lulled into a false sense of security because tests pass but test just aren't enough.

Users weren't using a particular feature because they said they didn't understand it. So we explained it, again and again. Turns out that feature was incredibly buggy and basically worked the way we claimed it did, only when it was used in the specific configuration we tested for. Add another node somewhere and weird stuff starts happening.

The tests looked good, and code coverage was great, but the fact that the tests run through all the branches of the code doesn't mean you're really testing for all behaviour. So I added tests for all configurations I could think of. I think that revealed another bug.

So look at the actual behaviour you need to test, not merely the code and branch coverage.

godelski 3 days ago [ - ]

  > Yeah, I first want to know there's an actual performance issue to fix.

Honestly, I think profilers and debuggers can really help with this too.

  > So I added tests for all configurations I could think of.

I think that's the key part. You can only test what you know or expect. So your tests can only be complete if you're omniscient.

jcgrillo 4 days ago [ - ]

I invite your attention to the StatsD telemetry protocol, where:

1. Every single measurement in a timeseries is encoded as a utf-8 string having (roughly) the following format:

  "${name}:${value}|${type}|${tags}"

where name is like "my.long.namespace.and.metric.name", value is a string formatted number, god only knows what type is, and tags is some gigantic comma separated key:value monstrosity.

2. Each and every one of these things is fired off into the ether in the form of a UDP datagram.

3. Whenever the server receives these presumably it gets around sometime to assigning them timestamps and inserts them into a database, not necessarily in that or any other particular order.

"it is the way it is[1]."

[1] https://github.com/statsd/statsd?tab=readme-ov-file#usage

godelski 3 days ago [ - ]

I think NodeJS goes against the idea of writing good and efficient software... JS just creates unnecessary complexity

jcgrillo 3 days ago [ - ]

I don't really know anything about js but this metrics protocol is how most telemetry data is transmitted on the wire. Petabytes per day of bandwidth are wasted on this.

lock1 3 days ago [ - ]

Ah, I'm bookmarking this. Thanks for writing this :)

I love how you put it: "grab a fucking profiler before you optimize". I get complaints sometimes about using FP because of performance, and I think a variant of "grab a fucking profiler before you optimize" is much better response than "avoid premature optimization". Introducing them to a magical thing called as "profiler" is a nice bonus too.

trinsic2 4 days ago [ - ]

> Problem is we've taken the idea of "minimum viable product" too far. People are saying "Doesn't have to be perfect, just has to work." I think most people agree. But with the current state of things? I disagree that things even work. We're so far away from the question of optimization. It's bad enough that there are apps that require several gigs to just edit a 30kb document but FFS we're living in a world where Windows Hello crashes Microsoft Outlook. It's not the programs are ugly babies that could be better, they are monstrosities begging to be put to death.

LOL. OMG that was beautiful. It almost feels like we are de-evolving software to a state where shit is going to stop working bad. I know this is not full of facts, but this take reminds me of Jonathan Blow's video "Preventing the Collapse of Civilization"[0] Where he talks about how code runs worse than it ever has and I think he was arguing that civilization is collapsing before our eyes in slow time.

[0]: https://youtu.be/pW-SOdj4Kkk?si=LToItJb1Cv-GgB4q&t=1089

godelski 4 days ago [ - ]

Good talk. I did something similar to him and all that happened is everyone was just saying I'm making a lot out of nothing. They're right that each thing was "nothing" but the problem is that this is a non-trivial number of "nothings" happening every day...

Honestly, I think the problem is that it's a Lemon Market[0]. Lemon markets thrive when there is asymmetric information. When a customer cannot tell the difference between a good product (peach) and a bad product (lemon). All it takes is a bunch of tech illiterate people... not sure where we'll find those...

On your video, funny thing. When I was in my PhD I had a very hard time publishing because I was building models that were much smaller, required less data, but got similar performance. Reviewers just looked at the benchmark like "not SOTA? lol". I've seen tons of great papers solving similar problems constantly get rejected. As a reviewer I frequently defended works like that as well as works that had good ideas but just didn't have enough GPU power. It was really telling...

[0] https://en.wikipedia.org/wiki/The_Market_for_Lemons

[P.S.] A nice solution I found for the pasting problem he mentioned (and in various forms) is that I first paste the text into the url bad or search bar then copy that and then paste. {<c-k>,<c-l>}<c-v><c-a><c-c>. Works 98% of the time every time.