I ran some tests on a similar concept using LangGraph. Unfortunately, I think while the results are meaningfully different from a foundation model, they don't supplement actual real-world data yet and don't provide sufficient diversity of thought and opinion between cases. For instance, asking 1,000 people you get a different response every time because each person is slightly unique, but with the LLM it is probabilistically different, not because of the slightly unique differences and for Meta reasons.