Thanks so much for the kind words - its 100% o3-mini for clustering. I have zero editorial input as to what constitutes a cluster, what's "top" news, etc.

The one subtlety is setting up the LLM to understand whether a new story belongs in an existing cluster, or with > 1 neighbors, constitutes a new cluster. The challenge there is scoping the clustering window (hours of stories for consideration) and topic breadth to avoid creating Katamari-super-clusters that just end up with every story associated to them.

At this point I seem to have found a sweet spot re: the hours window, the frequency of processing, and the design of the prompt such that its working consistently.

Very few false positives in terms of spurious clusters being created, or potential clusters being missed.