Rewriting the backend Bitwise Cloud, my semantic search for embedded systems docs Claude Code plugin from Python to Go.
The problem was the ML dependencies. The backend uses BGE-small-en-v1.5 for embeddings and FAISS for vector search. Both are C++/Python. Using them from Go means CGO, which means a C toolchain in your build, platform-specific binaries, and the end of go get && go build.
So I wrote both from scratch in pure Go.
goformer (https://www.mikeayles.com/blog/goformer/) loads HuggingFace safetensors directly and runs BERT inference. No ONNX export step, no Python in the build pipeline. It produces embeddings that match the Python reference to cosine similarity > 0.9999. It's 10-50x slower than ONNX Runtime, but for my workload (embed one short query at search time, batch ingest at deploy time) 154ms per embedding is noise.
goformersearch (https://www.mikeayles.com/blog/goformersearch/) is the vector index. Brute-force and HNSW, same interface, swap with one line. I couldn't justify pulling in FAISS for the index sizes I'm dealing with (10k-50k vectors), and the pure Go HNSW searches in under 0.5ms at 50k vectors. Had to settle for HNSW over FAISS's IVF-PQ, but at this scale the recall tradeoff is fine.
The interesting bit was finding the crossover point where HNSW beats brute-force. At 384 dimensions it's around 2,400 vectors. Below that, just scan everything, the graph overhead isn't worth it. I wrote it up with benchmarks against FAISS for reference.
Together they're a zero-dependency semantic search stack. go get both libraries, download a model from HuggingFace, and you have embedding generation + vector search in a single static binary. No Python, no Docker, no CGO.
Is it better than ONNX/FAISS? Heck no. I just did it because I wanted to try out Go.
goformer: https://github.com/MichaelAyles/goformer
goformersearch: https://github.com/MichaelAyles/goformersearch