From their video it just comes across as they stimulate different left/right neurons depending on where the enemy is on screen and then listen to some output that also says left/right. Shooting looks completely random, to be frank.

If you connected electrodes to two different fish, shocked them and interpreted twitching as intelligent output, fish could also play Doom. The interface is doing all the work.

It doesn't sound like the neurons have any concept of the game other than "left input means left output", which is a rather trivial result... It's effectively no different than the pong example.

They don't say anything on how much training is required for this to happen, or if there's any "learning" going on at all. The learning part is "next".