Hacker News

Thanks! I was waiting for your benchmarks. Do you plan to test haiku 3.5 too? It would be nice to show API prices needed to run the whole benchmark too to have a better idea of how many internal tokens o1 models consume.