I've struggled with adding evals to my AI agents for last few months, and felt that vibe evals should have a path to building a robust system down the line.

Working on a plugin for langfuse to create evals functions and dataset from ingested traces automatically, based on ad-hoc user feedback.