LLMs are wordsmith oracles. A lot of effort went into trying to coax interactive intelligence from them but the truth is that you could have probably always harnessed the base models directly to do very useful things. The instruct tuned models give your harness even more degrees of freedom.
A while ago, the autoresearch[1] harness went viral, yet it's but a highly simplified version of AlphaEvolve[2][3][4].
In the cybersecury context, you can envision a clever harness that probes every function in a codebase for vulnerabilities, then bubbles the candidates up to their callsites (and probes whether the vulnerability can be triggered from there) and then all the way to an interface (such as a syscall) where a potential exploit can be manifested. And those would be the low hanging fruit, other vulnerabilities may require the interplay of multiple functions. Or race conditions.
[1] <https://github.com/karpathy/autoresearch>
[2] <https://deepmind.google/blog/alphaevolve-a-gemini-powered-co...>
[3] <https://arxiv.org/abs/2506.13131>
[4] <https://github.com/algorithmicsuperintelligence/openevolve>