Pretty cool! How do you do to build these "stories" based on news?

Cheers and thank you! I'll reshare an earlier comment that I think answers your question - let me know:

Thanks so much for the kind words - its 100% o3-mini for clustering. I have zero editorial input as to what constitutes a cluster, what's "top" news, etc.

The one subtlety is setting up the LLM to understand whether a new story belongs in an existing cluster, or with > 1 neighbors, constitutes a new cluster. The challenge there is scoping the clustering window (hours of stories for consideration) and topic breadth to avoid creating Katamari-super-clusters that just end up with every story associated to them.

At this point I seem to have found a sweet spot re: the hours window, the frequency of processing, and the design of the prompt such that its working consistently.

Very few false positives in terms of spurious clusters being created, or potential clusters being missed.

Very interesting, how do you do that? Do you limit yourself what you feed or via custom instructions? I had a similar case so would love how you are doing the prompting here.

In my case we went with embeddings and clustering to find close papers to each other because llm were allucinating.