"if you know one forth, you know one forth"

So implement four of them, and you will know them all! First Forth with indirect threaded code, second Forth with direct threaded code, third Forth with subroutine threaded code, and the final fourth with token threaded code.

You jest, but I did end up doing just that in my implementation (https://github.com/romforth/romforth) trying to shoehorn a Forth implementation into a MSP430 device with just 2KB ROM + 128 byte RAM

I thought this was going to be a pun on the word "fourth", disappointed when I got to "final".

I doubt you will want to code professionally in Forth unless you work on embedded, so the dialect you learn doesn't matter too much. But it is interesting to implement a small interpreter and play with it.