This is incredibly software engineer-brained. The law doesn't work like software. The only thing that matters is how the judiciary interprets the text, and if you try to use LLM "test" output to argue for a specific interpretation, you'll be laughed out of court.

I believe you hastily misinterpreted the point. It's merely a tool that wasn't possible before.