Interesting. Claude Opus 4.8 and Gemini 3.1 Lite kind of got it right, but when I ask the model directly, they say they don't know. I'm curious how the tool is doing the correlation.
Interesting. Claude Opus 4.8 and Gemini 3.1 Lite kind of got it right, but when I ask the model directly, they say they don't know. I'm curious how the tool is doing the correlation.
Prompt for rollouts posted below (https://news.ycombinator.com/item?id=48592415). I have a bit more information on the clustering part in https://intheweights.com/about but every thing returned by the model is viewable (possibly under the "hallucinations" section)