Hacker News

I'm not sure what's the insight in this article here. All the things mentioned seem pretty straightforward. But I'll comment on a few:

> Really any finite set can be a "dimension".

This is absolutely true and yet typical programming languages don't do a good job of unleashing this. When is the last time you wish you could define an array where the indices are from a custom finite set, perhaps a range 100 to 200, perhaps an enum? I only know of two languages, Pascal and Haskell, that are sufficiently flexible. All other languages insist that indices for arrays must be integers either starting from 0 or 1. Haskell has a mechanism to let you control what is an array index: https://hackage.haskell.org/package/base-4.18.1.0/docs/Data-... The lack of this ability makes programmers use more general hash maps by default and therefore leave performance on the table.

> We can also transform Row -> PowerSet(Col) into Row -> Col -> Bool, aka a boolean matrix.

I mean sure. Does the author know type signatures and their operations like this are isomorphic to algebraic operations on size of the possibly infinite sets? The function A -> B has exactly |B|^|A| implementations so that's why Col -> Bool is in this sense "same" as PowerSet(Col). And the function operator associates to the right so adding Row -> doesn't change anything. Wait till you learn why sum types are called sum types and why product types are called product types. Here's another thing that might be mind-blowing: arrays of an unknown length can be thought of as the infinite union of arrays of lengths of all natural numbers: Array<T, 0> | Array<T, 1> | … so using the above notation the cardinality is 1 + |T| + |T|^2 + …; but they can also be represented as a standard linked list List<T>=Nil|(T, List<T>) and using the above notation we have |List<T>|=1+|T|*|List<T>| which simplifies to |List<T>|=1/(1-|T|). And guess what, the Taylor expansion of that is just the earlier 1 + |T| + |T|^2 + … precisely.

Older versions of Perl allowed you to set a lexically-scoped array base integer. It defaulted to zero and the docs mentioned 1-based arrays, but IIRC it could be any integer. This was deprecated several years ago in 5.12 as a harmful practice. Specifying using features of a Perl version of 5.16 or newer actually makes assigning to it (except for 0) a compile-time error.

Interestingly, though, Perl’s native arrays are decidedly not an array of primitives. They are arrays of scalar values, and a scalar value can not only contain a character, a string, an integer, a float, or a reference to another item (scalar, hash, array, blessed object, filehandle, etc) but it can return different values when called in a numeric context vs a string context. There are certainly ways to get at primitive values, but it’s not in Perl’s native array semantics.

$ python -m timeit --setup 'lookup = dict(zip("abcdefghijklmnopqrstuvwxyz", range(26)))' '[lookup[x] for x in "abcdefghijklmnopqrstuvwxyz"]' 500000 loops, best of 5: 884 nsec per loop $ python -m timeit '[ord(x) - 97 for x in "abcdefghijklmnopqrstuvwxyz"]' 200000 loops, best of 5: 1.21 usec per loop

Jtsummers 3 days ago [ - ]

> I only know of two languages, Pascal and Haskell, that are sufficiently flexible.

Julia, Fortran, some BASICs allow custom integer ranges.

Ada allows you to use any discrete range (integer, character, enums).

https://en.wikipedia.org/wiki/Comparison_of_programming_lang... - A more comprehensive list

The two columns of interest are "Specifiable index type" and "Specifiable base index".

cestith 3 days ago [ - ]

pklausler 3 days ago [ - ]

Non-default lower bounds in Fortran are a famous pitfall, however; they don't persist in many cases where one might think that they do, and are not 100% portable across compilers.

3 days ago [ - ]

[deleted]

zahlman 3 days ago [ - ]

> When is the last time you wish you could define an array where the indices are from a custom finite set, perhaps a range 100 to 200, perhaps an enum?

> ...

> The lack of this ability makes programmers use more general hash maps by default and therefore leave performance on the table.

You can simulate this in any language by hash-mapping the custom set elements to indices. (And in the general case, you won't be able to determine the indices faster than a hash table lookup.)

In fact, I imagine there are languages and implementations where hash table lookup is the fastest way to map 'a':'z' to 0:25.

In fact, let me try a (surely totally bogus) micro-benchmark right now:

Huh.

...And surely if you use the hash map directly, instead of using it to grab an index into another container, overall performance only gets better (unless maybe you need to save per-instance memory, and you have multiple containers using the same custom index scheme).

Hash tables are a pretty neat idea.