I see that you're looking for clusters within PCA projections -- You should look for deeper structure with hot new dimensional reduction algorithms, like PaCMAP or LocalMAP!
I've been working on a project related to a sensemaking tool called Pol.is [1], but reprojecting its wiki survey data with these new algorithms instead of PCA, and it's amazing what new insight it uncovers with these new algorithms!
https://patcon.github.io/polislike-opinion-map-painting/
Painted groups: https://t.co/734qNlMdeh
(Sorry, only really works on desktop)
[1]: https://www.technologyreview.com/2025/04/15/1115125/a-small-...
Thanks for pointing those out — I hadn’t seen PaCMAP or LocalMAP before, but that definitely looks like the kind of structure-preserving approach that would fit this data better than PCA. Appreciate the nudge — going to dig into those a bit more.
Try TDA ("mapper", or really, anything based on kernel density computed connectivity), it's a whole new world.
This ain't your parents' "factor analysis".
Ooooo I will definitely check it out! It's strangely hard to find any comparisons in youtube videos -- it seems TDA isn't actually a dimensional reduction algorithm, but something closely relayed, maybe?
LLM model interpretability also uses Sparse Autoencoders to find concept representations (https://openai.com/index/extracting-concepts-from-gpt-4/), and, more recently, linear probes.
I’ve had much better luck with umap than PCA and t-sne for reducing embeddings.
PaCMAP (and its descendant localmap) are comparable to t-sne at preserving both local and global structure (but without messing much with finicky hyperparameters)
https://youtu.be/sD-uDZ8zXkc