> try using obscure CPUs

I tried asking Gemini and ChatGPT, "What opcode has the value 0x3c on the Intel 8048?"

They were both wrong. The datasheet with the correct encodings is easily found online. And there are several correct open source emulators, eg MAME.

Even on a specific STM microcontroller (STM32G031), the LLM tools invent non-existent registers and then apologize when I point it out. And conversely, they write code for an entire algorithm (CRC, for example) when hardware support already exists on the chip.

Think of "What opcode has the value 0x3c on the Intel 8048" as a PNG image but the LLM like a very compressed JPEG. It will only get a very approximate answer. But you can give it explicit tools to look up things.

If the LLM doesn't have a websearch tool your test doesn't make any sense.

An LLM by itself is like a lossy image of all text in the internet.

Just some more parameters, and it would overfit that specific PDF too.