With a more straightforward approach, the tool can be reproduced with just a few queries in ClickHouse.
1. Create a table with styles by authors:
CREATE TABLE hn_styles (name String, vec Array(UInt32)) ENGINE = MergeTree ORDER BY name
2. Calculate and insert style vectors (the insert takes 27 seconds): INSERT INTO hn_styles WITH 128 AS vec_size,
cityHash64(arrayJoin(tokens(lower(decodeHTMLComponent(extractTextFromHTML(text)))))) % vec_size AS n,
arrayMap((x, i) -> i = n, range(vec_size), range(vec_size)) AS arr
SELECT by, sumForEach(arr) FROM hackernews_history GROUP BY by
3. Find nearest authors (the query takes ~50 ms): SELECT name FROM hn_styles ORDER BY cosineDistance(vec, (SELECT vec FROM hn_styles WHERE name = 'antirez')) LIMIT 25
┌─name────────────┬─────────────────dist─┐
1. │ antirez │ 0 │
2. │ geertj │ 0.009644324175144714 │
3. │ mrighele │ 0.009742538810774581 │
4. │ LukaAl │ 0.009787061201638525 │
5. │ adrianratnapala │ 0.010093164015005152 │
6. │ prmph │ 0.010097599441156513 │
7. │ teilo │ 0.010187607877663263 │
8. │ lukesandberg │ 0.01035981357655602 │
9. │ joshuak │ 0.010421492503861374 │
10. │ sharikous │ 0.01043547391491162 │
11. │ lll-o-lll │ 0.01051205287096002 │
12. │ enriquto │ 0.010534816136353875 │
13. │ rileymat2 │ 0.010591026237771195 │
14. │ afiori │ 0.010655186410089112 │
15. │ 314 │ 0.010768594792569197 │
16. │ superice │ 0.010842615688153812 │
17. │ cm2187 │ 0.01105111720031593 │
18. │ jorgeleo │ 0.011159407590845771 │
19. │ einhverfr │ 0.011296755160620009 │
20. │ goodcanadian │ 0.011316316959489647 │
21. │ harperlee │ 0.011317367800365297 │
22. │ seren │ 0.011390119122640763 │
23. │ abnry │ 0.011394133096140235 │
24. │ PraetorianGourd │ 0.011508457949426343 │
25. │ ufo │ 0.011538721312575051 │
└─────────────────┴──────────────────────┘