I wonder if I still have the link.

One of the papers I had bookmarked when toying with my own language design was someone that had worked out how to make interfaces as fast or faster than vtables by using perfect hashing and using the vtable as a hash table instead of a list.

You can also, when inlining a polymorphic call, put a conditional block in that bounces back to full dispatch if the call occasionally doesn’t match the common case. The problem with polymorphic inlining though is that it quickly resembles the exact sort of code we delete and replace with polymorphic dispatch:

    if (typeof arg1 == “string”) {
    } else if typeof arg1 === …) {
    } else if {
    } else if {
    } else {
    }

I've been thinking through what features I'd want in a language if I were designing one myself, and one of my desires is to have exhaustive matches on enums (which could be made of any primitive type) and sum types. The ability to generate perfect hashes at compile time was one of the things that falls out nicely from that

Nice one, TIL

One caveat with "hash vtables" is that you only really see a performance win when the interface has a lot of specializations.

As I just mentioned in another reply, the problem they were trying to solve was hierarchies where it makes sense for a group of types to be constructed by the combination of two or three narrowly scoped interfaces.

For instance, if you treat some collections as read only, you can define comprehensions across them with a single implementation. But that means the mutators have to be contained in another type, which a subset will implement, and may have covariant inputs.

> using the vtable as a hash table instead of a list.

Could you explain this a bit more? The word "list" makes me think you might be thinking that virtual method lookup iterates over each element of the vtable, doing comparisons until it finds a match -- but I'm certain that this is not how virtual method invocation works in C++. The vtable is constructed at compile time and is already the simplest possible "perfect hashtable": a short, dense array with each virtual method mapping to a function pointer at a statically known index.

The problem they were trying to solve was multiple inheritance, and by nominal type not by code reuse. So interfaces, basically.

So these guys essentially assigned a hashcode to every function of every interface and then you would do dispatch instead of obj.vtable[12] you would do modular math x = singature.hash % len(obj.vtable) and call that.

I believe this was sometime around 2005-2008 and they found that it was fast enough on hardware of that era to be usable.

Thanks, I think I get it now. The hash value would be a pure function of the method's signature (argument types and return type) and its name, so that two interfaces with a same-name, same-signature method would hash to the same value and thus invoke the same underlying method; the constraints would be that, after modulo, different methods must map to different indices; and the objective function to minimise would be the vtable size (which I think would be common across all classes).

But maybe I don't get it, since this would require knowledge of all interfaces, and as soon as you require that, it's straightforward to build a minimal-size mapping from method name+signature to integer index: e.g., just form the union of all method declarations appearing in any interface, sort them lexicographically, and use a method's position in this sorted list as its index. Lookups in this map are only ever done at compile time so there's no runtime inefficiency to worry about.

"list" here does not refer to a "linked list". In more academic circles, a "list" referes to any linear container. Such as a Python List. In practice, C++ vtables are effectively structs containing function pointers.

That's because that type of code is actually better performing than the dynamic dispatch.

There's absolutely nothing wrong with this code. It's just that it's not as extensible

It's a 'closed world' representation where the code assumes it knows about every possibility. This make extension more difficult

The code itself is extraordinarily good and performant.