Hacker News

That's a cool project!

I feel like the presentation of Lomuto's algorithm on p.110 would be improved by moving the i++ after the swap and making the corresponding adjustments to the accesses to i outside the loop. Also mentioning that it's Lomuto's algorithm.

These comments are probably too broad in scope to be useful this late in the project, so consider them a note to myself. C as the language for presenting the algorithms has the advantage of wide availability, not sweeping performance-relevant issues like GC under the rug, and stability, but it ends up making the implementations overly monomorphic. And some data visualizations as in Sedgewick's book would also be helpful.

tamnd 13 hours ago [ - ]

My biggest inspiration for this project, though, is The Art of Computer Programming (TAOCP), that level of depth and precision is the ultimate goal. I'm also planning to include formal proofs of all algorithms in Lean, though that could easily turn into a 10-year project.

kragen 4 hours ago [ - ]

Python probably is not suitable for any project that you want to retain value for 10 years or more; the community has a policy against maintaining backward compatibility.

tamnd 3 hours ago [ - ]

So I will stick with C. Recently, when I updated my very old knowledge, I noticed that C23 is actually pretty nice. It adds proper Unicode support, typeof and constexpr style keywords, improved atomics, and even better support for inline functions and generics.

That is also why in TAOCP, Don Knuth used MIX (and later MMIX) to keep the book language agnostic and timeless. Before Knuth's birthday next year, I am hoping to finish most of the exercises in TAOCP.

kragen 6 minutes ago [ - ]

Ambitious!

tamnd 13 hours ago [ - ]

Sedgewick's Algorithms book is great for practical learning but too tied to Java and implementation details. It is a bit shallow on theory, though the community and resources for other languages help.

That said, I personally prefer Introduction to Algorithms (CLRS) for its formal rigor and clear proofs, and Grokking Algorithms for building intuition.

The broader goal of this project is to build a well tested, reference quality set of implementations in C, Python, and Go. That is the next milestone.

kragen 6 hours ago [ - ]

Possibly you mean "too tied to Pascal" ;-)

tamnd 5 hours ago [ - ]

Was the very first edition of Sedgewick's Algorithms written in Pascal? I heard that but never actually saw that version myself.

Your comment brought back an old memory for me. My first programming language in high school was Turbo Pascal. That IDE was amazing: instant compilation, the blue screen TUI, F1 for inline help, a surprisingly good debugger, and it just felt so smooth and fast back then. No internet needed, no AI assistance, just pure focus and curiosity. Oh, how I really miss those days :)

kragen 4 hours ago [ - ]

I think the first several editions (two? three?) were in Pascal, yeah.

The first time I saw TP was on my uncle's Kaypro, which was sort of even more amazing: the screen wasn't capable of blue, the keyboard didn't have an F1 key, and the CPU wasn't capable of instant compilation. But the alternatives were assembly language and Microsoft BASIC!

tamnd 3 hours ago [ - ]

For assembly, by any chance, do you like the "pink shirt" book? It is one of my all-time favorites! I learned so much from it back in the day. Sadly, I lost my copy somewhere along the way :(

https://deprogrammaticaipsum.com/peter-norton/

kragen 3 hours ago [ - ]

I've never actually read it! I'm not sure what I would recommend if someone asked me for an introductory assembly-language book recommendation. Probably something for ARM or AMD64?

tamnd 2 hours ago [ - ]

I am not an expert but have been learning RISC-V Assembly in my backlog for some years now. https://riscv-programming.org/

Feels like what x86 could have been if it started clean.

kragen 19 minutes ago [ - ]

You mean, if the 80386 hadn't been designed with 8086 backwards compatibility in mind? What attributes do you see in common? "x86 started clean" might be a reasonable description of iAPX432, RL78, Itanic, 68000, or ARM2, from different points of view.

I've only written some small and simple RISC-V code; enough to know that I like it a lot better than AMD64 but not as well as ARM32.

amelius 10 hours ago [ - ]

Sedgewick is available for C also.

tamnd 10 hours ago [ - ]

For clarification, I meant the Algorithms, 4th Edition book at https://algs4.cs.princeton.edu/home/ which is entirely in Java. All the example code, libraries, and exercises there use Java, and the authors explicitly note that the book is written for that language.

However, you are right, Prof. Sedgewick has long maintained versions of his material across multiple languages. I remember that the third edition has C, C++ and Java versions.

tamnd 13 hours ago [ - ]

For visualization, I'm considering using the awesome Manim library by 3Blue1Brown (https://3b1b.github.io/manim/getting_started/quickstart.html). I'm not very good at visual design, but let see what I can come up with.

tamnd 13 hours ago [ - ]

To reduce the current monomorphism, I might add a generic version using void* and a comparator, or generate code for a few key types, while keeping the simple monomorphic listings for readability. (Though this would make the code a bit more complex)

kragen 3 hours ago [ - ]

Sedgewick published a C++ version of his book using templates for parametric polymorphism, but C++ is kind of a nightmare of a language. Unfortunately algorithms like quicksort really seem to demand parametric polymorphism, and the alternative languages with parametric polymorphism such as Haskell and ML mostly make memory allocation implicit, which runs counter to your goals. Rust offers both explicit memory management and parametric polymorphism, but Rust code is not very clean. Digital Mars D might be a possible alternative? But none of Rust, D, Pony, Zig, Nim, Odin, or even C++ are as widely understood as C, Python, and JS.

Sedgewick's C edition is the book that first showed me that programs can be beautiful. His favorite algorithm is of course Quicksort. His version of the partition function from Algorithms in C (01998 edition) is, I think, worth studying:

    int partition(Item a[], int l, int r)
     { int i = l-1, j = r; Item v = a[r];
       for (;;)
         {
           while (less(a[++i], v)) ;
           while (less(v, a[--j])) if (j == l) break;
           if (i >= j) break;
           exch(a[i], a[j]);
         }
       exch(a[i], a[r]);
       return i;
     }

There are some things to dislike in this presentation, like naming a variable "l", the rather excessively horizontal layout, and the brace style. And of course he picked Hoare's original branchy partition algorithm rather than Lomuto's simpler one. But I think there are some other choices that are beneficial:

• He's using short variable names to let the algorithm speak for itself.

• He uses a type Item (perhaps a typedef) to make the distinction clear between the type of array elements and the type of array indices.

• Similarly, he's using exch and less functions to abstract away the comparison and exchange operations. In the Golang standard library implementation, these are actually methods on the container to be sorted taking indices as arguments, so the concrete item type doesn't appear in the algorithm at all, which I think is an important insight. Of course the numerous

It would be possible to use the C preprocessor to compile specialized versions of the function for particular data types and comparators in a type-safe way by redefining Item, exch, and less, but of course when you do that the error messages are difficult to decipher. (But exch can't be just a regular function, because C passes arguments by value.)

Sedgewick's original presentation in the book (from the 01983 edition) inlines the partitioning into the Quicksort loop:

    procedure quicksort(l, r: integer)
      var v, t, i, j: integer;
      begin
      if r > l then
        begin
        v := a[r]; i := l-1; j := r;
        repeat
          repeat i := i+1 until a[i] >= v;
          repeat j := j-1 until a[j] <= v;
          t := a[i]; a[i] := a[j]; a[r] := t;
        until j <= i;
        a[j] := a[i]; a[i] := a[r]; a[r] := t;
        quicksort(l, i-l);
        quicksort(i+1, r)
        end
      end;

From my perspective the C edition is an enormous improvement. I don't have a copy of the (Pascal) second edition to compare.

The third C++ edition (01998) looks almost identical to the C version, except that it's templatized on the Item type, which in C++ also permits you to use static ad-hoc polymorphism (funciton overloading) for the exch function and the comparison operator:

   template <class Item>
   int partition(Item a[], int l, int r)
     { int i = l-1, j = r; Item v = a[r];
       for (;;)
         {
           while (a[++i] < v) ;
           while (v < a[--j]) if (j == l) break;
           if (i >= j) break;
           exch(a[i], a[j]);
         }
         exch(a[i], a[r]);
         return i;
      }

He illustrates the dynamic behavior of sorting algorithms in multiple ways. One is a vertical time sequence of horizontal arrays of letters, circling the pointed-to elements and graying out the unchanged ones. Another, which can scale to larger arrays, is showing a small-multiple time sequence of scatterplots, like a comic strip, where the X coordinate is the array index and the Y coordinate is the element value at that array index. (A variant of this draws bars whose heights are the element values.) From my perspective, static visualizations like this should be preferred over animation when applicable (https://worrydream.com/MagicInk/), but this case is maybe on the borderline where both animation and static visualization have unique merits.

My own terse presentation of Lomuto's algorithm is http://canonical.org/~kragen/sw/dev3/paperalgo#addtoc_21. I don't think it reaches the clarity of an exposition in C or Python, but it is much easier to write on a whiteboard. I do think that expressing the partition subroutine in terms of an operation on a Golang-like slice improves the clarity significantly. In C or C++ you could pass a base pointer and a length; perhaps Sedgewick didn't want to expose students to this aspect of C arrays, or perhaps he was still constrained by Pascal thinking.

tamnd 2 hours ago [ - ]

Before generics, Go sorting usually looked like this:

type ByValue []int

func (a ByValue) Len() int { return len(a) } func (a ByValue) Swap(i, j int) { a[i], a[j] = a[j], a[i] } func (a ByValue) Less(i, j int) bool { return a[i] < a[j] }

sort.Sort(ByValue(a))

It worked fine but felt a bit awkward for small programs. After Go 1.18, you can do it directly with generics:

s := []int{3, 1, 4, 1, 5, 9} slices.Sort(s)

Much simpler and expressive, yet still type safe. I really like how Go kept it minimal without adding too many abstractions.

kragen 5 minutes ago [ - ]

Yup. You might want to indent your code four spaces.

Compare how it's done in ML, which has had generics for almost 50 years: https://ocaml.org/cookbook/sorting-lists-and-arrays/stdlib https://ocaml.org/manual/5.4/api/List.html#1_Sorting

tamnd 2 hours ago [ - ]

Thanks for the very well written and detailed explanation! I think from your comment I finally see a clearer path to a better implementation. Based on my own experience, I will probably start by writing it in Go first (that is the language I can think in most naturally), and then try to reason about how to make an idiomatic C version afterward, which will be interesting since I have not really written C in about 15 years.

kragen 10 minutes ago [ - ]

I hope some of my thoughts turn out to be useful! I was a bit reluctant to offer most of them for fear they might sound like an attack, especially the things that would involve large changes to what you've already written.

Golang, like Python and JS, is garbage-collected; in some cases this is helpful for focusing attention on the central semantics of the algorithms, but it does have the disadvantage that it makes reasoning about space behavior and runtime performance much more difficult. One of Quicksort's chief benefits, for example, is that it's in-place, which the Python version in your "cheat sheet" isn't.

tamnd 2 hours ago [ - ]

After finishing this book, I think extracting the Go code into a ready-to-use data structures and algorithms library could be a nice idea. I feel confident writing production-grade Go code, so it would be interesting to shape the examples into something practical and reusable beyond just learning exercises.