Totally agree. I am looking to build something more complex next, something like PS1 in a different language as test. That would require significant more effort but with the speed of how model gets improved I am optimistic.

It seems the most difficult topic is automating the performance optimizations.

For example: "I've run this task on real hardware and took 5 seconds, keep optimizing and iterating until you achieve similar values"

I'd love seeing a linux emulator running on DART simply because it removes the need for dependencies on each platform.