Hey HN, this will likely interest you if you're a) into dense data visualization or b) trying to figure out how to measure the quality of a retrieval-augmented generation (RAG) system.
There's an OSS tool called Open RAG Eval that analyzes RAG-based query-and-answer sets to generate a dense set of metrics in an "evaluation report". This report is in CSV format and the data is basically impossible for a human to read because there's so much of it.
I built Open Evaluation to enable folks to load in a report and visualize the evaluation metrics in a more human-readable way. The challenge was the sheer amount of information to visualize. I went with a collapsible table with sticky headers to presenting the info, so you can compare metrics across reports and questions. I also tried to make everything clickable, so if you want to understand the meaning behind a metric you can just click it to open up an info panel to learn more about it.
The site has built-in sample evaluation reports, so you can try it out without needing to generate your own reports. If you give it a shot please share your feedback. I'd love to find ways to make this more usable.
Full disclosure: I did this for work and my coworkers also made Open RAG Eval.