Can check out this recent paper doing scalable formal verification of LLMs "BEAVER: An Efficient Deterministic LLM Verifier": https://arxiv.org/abs/2512.05439
Can check out this recent paper doing scalable formal verification of LLMs "BEAVER: An Efficient Deterministic LLM Verifier": https://arxiv.org/abs/2512.05439
This paper looks pretty groundbreaking. The ability to verify LLMs at scale (e.g., 70B) on real-world tasks like math reasoning and code security is extremely impressive and impactful.