I think what's missing is a benchmark that measures how well the memories contribute to future interactions.