Hacker News

new PostgreSQL index type which outperforms B-Trees for many common cases. As a wild experiment, I'm entirely vibe coding this and not hand-writing it.

It works by specializing for the common case of read-only workloads and short, fixed-length keys/includes (int, uuid, text<=32b, numeric, money, etc - not json) and (optionally) repetitive key-values (a common case with short fixed-length keys). These kinds of indexes/tables are found in nearly every database for lookups, many-many JOIN relationships, materialized views of popular statistics, etc.

Currently, it's "starting to work" with 100% code coverage and performance that usually matches/beats btree in query speed. Due to compression, it can consume as little as 99.95% less memory (!) and associated "pressure" on cache/ram/IO. Of course, there are degenerate cases (e.g. all unique UUID, many INCLUDEs, etc) where it's about the same size. As with all indexes, performance is limited by the PostgreSQL executor's interface which is record-at-a-time with high overhead records. I'd love help coding a FDW which allows aggregates (e.g. count()) to be "pushed down" and executed in still requires returning every record instead of a single final answer. OTT help would be a FDW interface where substantial query plans could be "pushed down" e.g. COUNT().

The plan is to publish and open source this work.

I'd welcome collaborators and have lots of experience working on small teams at major companies. I'm based in NYC but remote is fine.

- must be willing to work with LLMs and not "cheat" by hand-writing code.

- Usage testing: must be comfortable with PostgreSQL and indexes. No other experience required!

- Benchmarking, must know SQL indexes and have benchmarking experience - no pgsql internals required.

- For internals work, must know C and SQL. PostgreSQL is tricky to learn but LLMs are excellent teachers!

- Scripting code is in bash, python and Makefile, but again this is all vibe coded and you can ask LLMs what it's doing.

- any environment is fine. I'm using linux/docker (multi-core x86 and arm) but would love help with Windows, native MacOS and SIMD optimization.

- I'm open to porting/moving to Rust, especially if that provides a faster path to restricted environments like AWS RDS/Aurora.

- your ideas welcome! but obviously, we'll need to divide and conquer since the LLMs are making rapid changes to the core and we'll have to deal with code conflicts.

DM to learn more (see my HN profile)