Oh, I have a relevant and surprisingly simple example: Binary search. Binary search and its variants leftmost and rightmost binary search are surprisingly hard to code correctly if you don't think about the problem in terms of loop invariants. I outlined the loop invariant approach in [1] with some example Python code that was about as clear and close to plain English at I could get.
Jon Bentley, the writer of Programming Pearls, gave the task of writing an ordinary binary search to a group of professional programmers at IBM, and found that 90% of their implementations contained bugs. The most common one seems to have been accidentally running into an infinite loop. To be fair, this was back in the day when integer overflows had to be explicitly guarded against - but even then, it's a surprising number.
[1]: https://hiandrewquinn.github.io/til-site/posts/binary-search...
I read about this and started using binary search as my interview question. It worked well - about 2/3rds of highly credentialed applicants could not write a working implementation in 20 minutes. Most failures went into an infinite loop on simple cases! The ones who could write it usually did so quickly.
I think part of reason is that most people were taught a bad interface. Even the code on Wikipedia says “Set L to 0 and R to n-1”. That is, R is an inclusive bound. But we’ve learned that for most string algorithms, it is better when your upper bound is an exclusive bound, that is, n.
I’ve wanted to do an experiment testing that hypothesis. That is, ask a large number of people to write it with different function prototypes and initial calls and see how many buggy implementations I get with inclusive vs exclusive upperbound vs length.
> I read about this and started using binary search as my interview question. It worked well - about 2/3rds of highly credentialed applicants could not write a working implementation in 20 minutes.
I'd be interested to know why you feel that this is a useful interview question.
Implementing a well-known and widely implemented algorithm under stressful interview conditions wouldn't seem to me to demonstrate much about a candidate. In that situation I'd much rather be given a printout of some code and asked to talk around the implementation - how the code maps to the abstract algorithm, implementation trade-offs, readability, etc.
To be fair you're picking an example that's extremely finicky about indices. It's probably the hardest basic algorithm to write down without errors. Up there with Hoare partition.
You're right, but invariants are the most bang for the buck you can get out of formal methods without having to use a DSL built for computer assisted proofs.
That they work for binary search is a very strong signal to people not familiar with them that they work for nearly everything (they do).
This makes for an interesting test. I checked Claude Sonnet, just for the he'll of it
# Example usage and test cases if __name__ == "__main__": # Test cases test_array = [1, 3, 5, 7, 9, 11, 13, 15]Very good! Looks good to me. One small callout:
This implementation detail is to my knowledge unnecessary in Python because Python's built-in int type has arbitrary-precision integers. It's intended to avoid buffer overflows in languages like C.Imagine, say, that left is 1 and right is 2^63 - 1. In Python left + right will just give you 2^63, no big deal. In C, left + right will overflow and produce undefined behavior; in practice it usually gives you I think -2^63, which is obviously going to screw up the bsearch a bit. It isn't wrong, just slightly less idiomatic for the language.
Python's interpreter may or may not be able to recognize and refactor the slight inefficiency of the extra arithmetic operation out. I will only point out that most of the time when we write our own bsearch it's because we want to optimize something really fundamental to the codebase. Any time you have to whip up your own algorithm it's prima facie evidence that that part of the code might be good to profile.
Even in languages like C, it's mostly voodoo in practice unless you're using undersized indices anyway—you surely don't have anywhere near SIZE_MAX/2 elements.
The underlying problem is sloppy use of int for indices. The roundabout calculation will still overflow at twice the size.
I knew about the infamous Binary Search bugs in books, and I dared to write the first bug-free implementation in my book, very carefully. Still, it ended up having bugs. :) Luckily, Manning's early access program let me fix them before printing.
Oh interesting, where can we read about the bug that was there?
The way I always remember leftmost and rightmost binary search (the C++ equivalents-ish of lower_bound and upper_bound) is to always have a "prior best" and then move the bounds according to the algo
while (l < r)
{
//find the midpoint
auto mp = l + (l-r)/2;
if (nums[mp] == target) { prior = target;
#ifdef upper_bound
l = target + 1; // move the left bound up, maybe there's more up there we can look for!
#else
//lower bound, we found the highest known instance of target, but let's look in the exclusive left half a bit more
r = target - 1;
#endif
}
excuse the terrible formatting, it's been a long day grinding leetcode after getting laid off...
Godspeed, fellow LeetCoder. I'm not currently grinding but I still have my handful of practice problems in active Anki rotation.
I have my rightmost code at part III of the miniseries, [1]. It loks quite similar, but I save the -1 for the very end return.
(It should technically be BEFORE, I guess. If it helps all rightmost bsearches are also leftmost bsearches on the reversed array, so AFTER is secretly not wrong)[1]: https://hiandrewquinn.github.io/til-site/posts/binary-search...
The C++ standard library phrases binary search as a partitioning problem - finding the index where a predicate goes from false to true, which is helpful for me.
A research paper from Google in 2006 noted that "almost all" binary search (and merge sort) implementations contain a bug (and had for decades), so 90% seems impressive in the face of that.
https://research.google/blog/extra-extra-read-all-about-it-n...
The bug in that paper is actually the very buffer overflow bug I was referring to. Given that Jon himself made that error in the "definitive" implementation, it seems unlikely that he would have spotted it in the 10% of implementations he considered correct. Under Google's stricter criteria it seems likely to me that not a single person got the search implementation truly correct enough.
In Javascript that Google "bug" only occurs if an array has more than 4503599627370495 items. You may run into memory issues before running into that bug.
What are the odds you write a binary search that you'll use more than once instead of just running it and writing down the result?
I have an object representing piecewise linear functions, defined by a sequence of (x_i, y_i) tuples. It has a method to evaluate y for a given value of x. The first step is to find the least i for which x_i is greater than or equal to x: this uses a binary search. Initially, I stored the x values in a separate array and used dotnet's inbuilt Array.BinarySearch. Later I removed this array to save on memory and it now runs a handbuilt binary search on the first items of the array of points instead.
I once needed to write binary search for a microcontroller in C (no libraries). The routine ran about every hour, with appx 4M data points.
I once worked lots of projects in C, microcontrollers, embedded systems etc. It was a start up.
Every time I needed to write something algorithmically demanding, I could do it in a day or two. Im not into Leetcoding, or competitive coding.
Most regular everyday programmers can work these things out in one or two workdays.
Its definitely not like the competitive programmers say, like if you aren't into this full time, at the time you need to write something you won't have access to time, internet and even an IDE and have to write the code in a Google doc(which needs internet connection, when I pointed this out in the interviews they didn't like it).
That's a very surprising angle of questioning. Are you writing some sort of compile-time-only programs?
About 100%?
When do you write code that doesn't need to search? (Unless you hand it all to the database, in which case... sure, you're good)