I love this concept! I have always believed that the old methodologies used in NLP and statistics can be better and faster than new LLM technologies like embeddings, depending on the scenario. Will the code be open-sourced someday? I'm thrilled to learn from it.
I think there is so much value and room to grow by leveraging a statistical foundation. We’re still iterating really quickly on the low level C code on a variety of applications (pharma, scRNA, text) so it might be a while before we release it standalone.
We do offer an api layer (the website is a light layer above this) over the low level statistics code focused on making it super easy to apply to language data if you are interested in playing around with it: https://docs.sturdystatistics.com
Oops, didn't notice you already have a business model, surely making it a platform is better for long-term development. Wish it success!