If you are on a sota model and your context window is less than 100k tokens and you don't have any vague or contradicting rules, then I've almost never seen a rule broken

The most common failure I've seen come from tools that pollute their context with crap and the llm will forget stuff or just get confused from all the irrelevant sentences; which if the report is true, is probably what these ai notetakers are guilty of. This problem gets exacerbated if these tools turn on the 1M context window version.

Yeah, that's exactly why I have full confidence in that system, especially for medical notetaking. /s