Hacker News

Rewriting the backend Bitwise Cloud, my semantic search for embedded systems docs Claude Code plugin from Python to Go.

The problem was the ML dependencies. The backend uses BGE-small-en-v1.5 for embeddings and FAISS for vector search. Both are C++/Python. Using them from Go means CGO, which means a C toolchain in your build, platform-specific binaries, and the end of go get && go build.

So I wrote both from scratch in pure Go.

goformer (https://www.mikeayles.com/blog/goformer/) loads HuggingFace safetensors directly and runs BERT inference. No ONNX export step, no Python in the build pipeline. It produces embeddings that match the Python reference to cosine similarity > 0.9999. It's 10-50x slower than ONNX Runtime, but for my workload (embed one short query at search time, batch ingest at deploy time) 154ms per embedding is noise.

goformersearch (https://www.mikeayles.com/blog/goformersearch/) is the vector index. Brute-force and HNSW, same interface, swap with one line. I couldn't justify pulling in FAISS for the index sizes I'm dealing with (10k-50k vectors), and the pure Go HNSW searches in under 0.5ms at 50k vectors. Had to settle for HNSW over FAISS's IVF-PQ, but at this scale the recall tradeoff is fine.

The interesting bit was finding the crossover point where HNSW beats brute-force. At 384 dimensions it's around 2,400 vectors. Below that, just scan everything, the graph overhead isn't worth it. I wrote it up with benchmarks against FAISS for reference.

Together they're a zero-dependency semantic search stack. go get both libraries, download a model from HuggingFace, and you have embedding generation + vector search in a single static binary. No Python, no Docker, no CGO.

Is it better than ONNX/FAISS? Heck no. I just did it because I wanted to try out Go.

goformer: https://github.com/MichaelAyles/goformer

goformersearch: https://github.com/MichaelAyles/goformersearch