May I ask your internal benchmark ? I'm building a new set of benchmarks and testing suite for agentic workflows using deepwalker [0]. How do you design your benchmark suite ? would be really cool if you can give more details.
May I ask your internal benchmark ? I'm building a new set of benchmarks and testing suite for agentic workflows using deepwalker [0]. How do you design your benchmark suite ? would be really cool if you can give more details.
Shared a bit more here - https://news.ycombinator.com/item?id=46314047.
But pretty rudimentary, nothing special. Also did not know about deepwalker, looks quite interesting - you building it?
I personally know the team who builds the product.