it is not easy for a compiler to vectorize

a pragmatic approach: write in a high level interpreted language that rhymes with modern CPUs, vector extensions, memory bandwidth

e.g. apl [0], bqn [1], k [2], kiwi [3]

  - vectors are dense (not boxed)
  - optimized internal representation (e.g. bitpacked bool vectors)
  - primitives act on vectors + use avx, neon if possible
[0] https://www.dyalog.com [1] https://mlochbaum.github.io/BQN/ [2] https://kx.com [3] https://kiwilang.com

great article by marshall on BQN performance compared to C and how to think about it

https://mlochbaum.github.io/BQN/implementation/versusc.html

related:

  - columnar databases: kdb, duckdb, clickhouse
  - machine learning frameworks: pytorch, keras, jax, mlx