Genuine question, how did you get to these 11 LLMs instead of 10 or 12? I'm interested in understanding how you did benchmark these 11 LLMs or whether it was an arbitrary ensemble you selected.

Interesting idea

I get codex to use openrouter api and ask it to find 5 cheap but highly efficient LLMs at the task that km doing based on benchmarks and descriptions

I then run the query through all 5, get a markdown file for each in case I want to read through it later and have codex analyze and improve things based on those 5 outputs

It’s very easy and can scale to 11 or more LLMs with the same api

[flagged]

Does the user set up API keys for those 11 LLMs or is API cost included in the product? Do you test for tool hallucination or only information hallucination?

[flagged]

Problem: We have AI Slop.

Solution: Lets make MORE AI Slop and hope it goes away somehow.

AI Psychosis in full swing.

[flagged]

[flagged]

[flagged]