There was a CMOVE architecture around 1990 (Israel), I think. It was very similar. Could not find it on internet, sadly.
The MOVE architectures may work best with digital signal processors, because the data-flow is almost constant in such processors.
I invented my own version of the move only architecture (around 1992), but focused on speed. So here is my idea below.
1. The CPU only moves within the CPU, like from one register to the other. So all moves are extremely fast.
2. The CPU is separated in different units that can do work separately. Each unit has different input and output ports. The ports and registers are connected via a bus.
3. The CPU can have more buses and thus do more moves at the same time. If an output-data is not ready, the instruction will wait.
Example instruction: OUT1 -> IN1, OUT2 -> IN2 With 32 bits it would give give 8 units with 32 ports each.
Example of some set of units and ports. Control unit: (JUMP_to_address, CALL_to_address, RETURN_with_value, +conditionals) Memory unit: (STORE_Address, STORE_Value, READ_Address, READ_Value), Computation unit: (Start_Value, ADD_Value, SUB_Value, MUL_Value, DIV_Value, Result_Value) Value unit: (Value_from_next_instruction, ZERO, ONE) Register unit: (R0 ... R31)
It is extremely flexible. I also came up with a minimalist 8 bit version. One could even "plug-in" different units for different systems. Certain problems could be solved with adding special ports, which would work like a special instruction.
I did not continue the project due to people not understanding the bus architecture (like a PCI-bus). If you try to present it in a logical-gate architecture (like in the article), the units make the architecture more complicated than it actually is.
Sounds similar to TIS-100, but with even more special-purpose units.