To reduce the current monomorphism, I might add a generic version using void* and a comparator, or generate code for a few key types, while keeping the simple monomorphic listings for readability. (Though this would make the code a bit more complex)
To reduce the current monomorphism, I might add a generic version using void* and a comparator, or generate code for a few key types, while keeping the simple monomorphic listings for readability. (Though this would make the code a bit more complex)
Sedgewick published a C++ version of his book using templates for parametric polymorphism, but C++ is kind of a nightmare of a language. Unfortunately algorithms like quicksort really seem to demand parametric polymorphism, and the alternative languages with parametric polymorphism such as Haskell and ML mostly make memory allocation implicit, which runs counter to your goals. Rust offers both explicit memory management and parametric polymorphism, but Rust code is not very clean. Digital Mars D might be a possible alternative? But none of Rust, D, Pony, Zig, Nim, Odin, or even C++ are as widely understood as C, Python, and JS.
Sedgewick's C edition is the book that first showed me that programs can be beautiful. His favorite algorithm is of course Quicksort. His version of the partition function from Algorithms in C (01998 edition) is, I think, worth studying:
There are some things to dislike in this presentation, like naming a variable "l", the rather excessively horizontal layout, and the brace style. And of course he picked Hoare's original branchy partition algorithm rather than Lomuto's simpler one. But I think there are some other choices that are beneficial:• He's using short variable names to let the algorithm speak for itself.
• He uses a type Item (perhaps a typedef) to make the distinction clear between the type of array elements and the type of array indices.
• Similarly, he's using exch and less functions to abstract away the comparison and exchange operations. In the Golang standard library implementation, these are actually methods on the container to be sorted taking indices as arguments, so the concrete item type doesn't appear in the algorithm at all, which I think is an important insight. Of course the numerous
It would be possible to use the C preprocessor to compile specialized versions of the function for particular data types and comparators in a type-safe way by redefining Item, exch, and less, but of course when you do that the error messages are difficult to decipher. (But exch can't be just a regular function, because C passes arguments by value.)
Sedgewick's original presentation in the book (from the 01983 edition) inlines the partitioning into the Quicksort loop:
From my perspective the C edition is an enormous improvement. I don't have a copy of the (Pascal) second edition to compare.The third C++ edition (01998) looks almost identical to the C version, except that it's templatized on the Item type, which in C++ also permits you to use static ad-hoc polymorphism (funciton overloading) for the exch function and the comparison operator:
He illustrates the dynamic behavior of sorting algorithms in multiple ways. One is a vertical time sequence of horizontal arrays of letters, circling the pointed-to elements and graying out the unchanged ones. Another, which can scale to larger arrays, is showing a small-multiple time sequence of scatterplots, like a comic strip, where the X coordinate is the array index and the Y coordinate is the element value at that array index. (A variant of this draws bars whose heights are the element values.) From my perspective, static visualizations like this should be preferred over animation when applicable (https://worrydream.com/MagicInk/), but this case is maybe on the borderline where both animation and static visualization have unique merits.My own terse presentation of Lomuto's algorithm is http://canonical.org/~kragen/sw/dev3/paperalgo#addtoc_21. I don't think it reaches the clarity of an exposition in C or Python, but it is much easier to write on a whiteboard. I do think that expressing the partition subroutine in terms of an operation on a Golang-like slice improves the clarity significantly. In C or C++ you could pass a base pointer and a length; perhaps Sedgewick didn't want to expose students to this aspect of C arrays, or perhaps he was still constrained by Pascal thinking.
Before generics, Go sorting usually looked like this:
type ByValue []int
func (a ByValue) Len() int { return len(a) } func (a ByValue) Swap(i, j int) { a[i], a[j] = a[j], a[i] } func (a ByValue) Less(i, j int) bool { return a[i] < a[j] }
sort.Sort(ByValue(a))
It worked fine but felt a bit awkward for small programs. After Go 1.18, you can do it directly with generics:
s := []int{3, 1, 4, 1, 5, 9} slices.Sort(s)
Much simpler and expressive, yet still type safe. I really like how Go kept it minimal without adding too many abstractions.
Thanks for the very well written and detailed explanation! I think from your comment I finally see a clearer path to a better implementation. Based on my own experience, I will probably start by writing it in Go first (that is the language I can think in most naturally), and then try to reason about how to make an idiomatic C version afterward, which will be interesting since I have not really written C in about 15 years.
After finishing this book, I think extracting the Go code into a ready-to-use data structures and algorithms library could be a nice idea. I feel confident writing production-grade Go code, so it would be interesting to shape the examples into something practical and reusable beyond just learning exercises.