Hi! This is a totally fair question, and I appreciate you raising it. Getting reliable performance out of an LLM on something as structured as a schematic is hard, and I don’t want to pretend this is a solved problem or that the tool is infallible.

Benchmarking is tricky right now because there aren’t many true “LLM ERC” systems to compare against. You could compare against traditional ERC, but this tool is meant to complement that workflow, not replace it. For this initial MVP, most of the accuracy work has come from collecting real shipped-board schematics (mine and friends’) with known issues and iterating until the tool consistently detected them. A practical way to evaluate it yourself is to upload designs you already know have issues, along with the relevant datasheets, and see how well it picks them up. Additionally, If you have a schematic with known mistakes and are open to sharing it, feel free to reach out to through the "contact us" page. Contributions like that are incredibly helpful, and I’d be happy to provide additional free usage in return.

I’ll also be publishing case studies soon with concrete examples: the original schematics, the tool’s output, what it caught (and what it missed), and comparisons against general-purpose chat LLM responses.

The goal isn’t to replace a designer’s judgment, but to surface potential issues that are easy to miss. Similar to how AI coding tools flag things you still have to evaluate yourself. Ultimately the designer decides what’s valid and what isn’t.

I really appreciate the push for rigor, and I’ll follow up once the case studies are live.