I'm currently in the process of improving the testing part of our infrastructure at Morphik (https://github.com/morphik-org/morphik-core). There seem to be a lot of different RAG evaluations out there, and I'm just curious which ones you care about the most.
Of course, these will be different based on use cases, but I'd love to learn more about your use case, and which benchmark or eval matters the most for it and why.