This is great, I'll be returning to this tool often. Thanks.
A few suggestions and ideas for futher projects.
-allow for "keyword", -negate operators and "mult word string" searches, [Pubmed](https://pubmed.ncbi.nlm.nih.gov/advanced) is what I'd consider an Ideal search interface
-allow for regex, or direct sql lookups with limited query time ratelimited by POW. for example, if the server is under load, require a token from something like [anubis](https://anubis.techaro.lol/) and lower the maximum DB query time
-Index the title of all discussion/forum type posts with a VectorDB for semantic search. And add an option to sort by replies. (Like [answer overflow](https://github.com/AnswerOverflow/AnswerOverflow))This would make it possible to find relevant discussions among ~60B messages. ScyllaDB doesn't support vector search, so I'd suggest something like [usearch](https://github.com/unum-cloud/usearch) for a detached index. Embedding models are faster and smaller than most people realize, pick whatever's on top of the [mteb leaderboard](https://huggingface.co/spaces/mteb/leaderboard) after deciding on size.
-calculate the jaccard similarity (user overlap) between discord server members, this would allow for searching in "similar" severs, and potentially, mapping discord. [github](https://anvaka.github.io/map-of-github) [reddit](https://anvaka.github.io/map-of-reddit)
-fix doxing. Searching by <@userid> is currently possible.
-expect the alternative to the cloudflare captcha to be abused, it's too simple for modern solvers.
-open source the stack? I'm interested in the scraper.