Hacker News

Thank you so much for attempting a reproduction! (I posted this on Reddit and most commenters didn't even click the link)

For the baseline you need SIMDe headers: https://github.com/simd-everywhere/simde/tree/master/simde. These alias x86 intrinsics to ARM intrinsics. The baseline is based on the previous State-of-The-Art (https://arxiv.org/abs/1209.2137) which happens to be x86-based; using SIMDe to compile was the highest-integrity way I could think of to compare with the previous SOTA.

Note: M1 chips specifically have notoriously bad small-shift performance, so the benchmark results will be very bad on your machine. M3 partially fixed this, M4 fixed completely. My primary target is server-class rather than consumer-class hardware so I'm not too worried about this.

The benchmark results were cpy-pasted from the terminal. The README prose was AI generated from my rough notes (I'm confident when communicating with other experts/researchers, but less-so with communication to a general audience).

$ ./out/bytepack_eval Bytepack Bench — 16 KiB, reps=20000 (pinned if available) Throughput GB/s K NEON pack NEON unpack Baseline pack Baseline unpack 1 94.77 84.05 45.01 63.12 2 123.63 94.74 52.70 66.63 3 94.62 83.89 45.32 68.43 4 112.68 77.91 58.10 78.20 5 86.96 80.02 44.32 60.77 6 93.50 92.08 51.22 67.20 7 87.10 79.53 43.94 57.95 8 90.49 92.36 68.99 83.88

deadmutex 2 days ago [ - ]

Here is a repro using GCE's C4A Axion instances (c4A-highcpu-72). Seems to beat Graviton? Maybe the title of the thread can be updated to a larger number :) ? I used the largest instance to avoid noisy neighbor issues.

ashtonsix a day ago [ - ]

Oh nice! Axion C4A and Graviton4 use the same core (Neoverse V2), so the performance difference is due to factors like clock speed and power management.

I used a geometric mean to calculate the top-line "86 GB/s" for NEON pack/unpack; so that's 91 GB/s for the C4A repro. Probably going to leave the title unmodified.

ozgrakkurt 2 days ago [ - ]

Super cool!

Pretty sure anyone going into this kind of post about simd would prefer your writing to llm

2 days ago [ - ]

[deleted]