Instead of driving the agent with an llm, it might work to use the agent to hard code heuristics, and use some kind of a simulation to benchmark its skills? Then feeding the results back to the agent so it can improve the heuristics?
Instead of driving the agent with an llm, it might work to use the agent to hard code heuristics, and use some kind of a simulation to benchmark its skills? Then feeding the results back to the agent so it can improve the heuristics?