Forth is a bit of a headtrip for people due to (a) how it parses and executes and (b) the stack-based nature of it. Particularly, symbols traditionally considered syntactic are fair game* for user-space definitions, and some words are "immediate" which means that they act at compile time, where others act at runtime. This combination seems particularly challenging for students of languages which don't mix compile-time and runtime, only use a stack for function calls, etc.
I'm of the opinion that one must write a forth to grok forth, and I'm far from alone in that. However, I'd not quite call forth itself esoteric. But to many, it's a language family... of which most of the members are esoteric inasmuch they only have a single user.
* shout-out to my friend whose forth supports lists, hash tables, and structs through syntactic sugar... https://github.com/cstrainge/sorth
Why is RPN a head trip? Or rather... I don't think it's as much a head trip as prefix or infix notation. You've got your data. You've got your operation. Sometimes the data is on a stack. Sometimes it's on a heap. Just a little different way of specifying which is where.
Also... +1 on the "you've got to write it to understand it." And in the 80s, the documentation wasn't super. Leo Brodie's book was great to get you started, but understanding things like ' (tick) and how to program in "idiomatic forth" was a challenge. So I would add, "not only do you have to code FORTH to understand FORTH, you also have to rip someone elses' FORTH program apart to understand the more advanced bits." -- I could be wrong about that today, it's been a while since I did a survey of FORTH documentation.
> Why is RPN a head trip?
It isn't how we teach math in the two countries I've lived in. Lisp is just as weird. Many people see a mathematical expression and panic. A level up from them, people see a mathematical expression in some source code and expect it to respect the symbol precedence that they were taught in grade school -- which they can "understand" without knowing how the language parses and abstracts all that away. And maybe their understanding is flawed but they can survive as programmers for decades without ever going deeper. Lisps, FORTHs, etc., don't allow you to proceed without understanding.
My bigger problem with read other people's Forth is that no two Forths are the same. The language itself isn't really the problem. It is that you can only approach someone else's Forth code bottom-up otherwise you just don't understand at all what is going on. Most other languages allow you to dive in from the top, learn as much abstraction as required to get the job done and then you can move on.
I was famous at IBM for the quip: "The good news about FORTH is you can use it to write your own DSLs to model the problem you're working on. The bad news is the person down the hall already has." But we still used a metric butt-load of FORTH for board bring-up and firmware.
But more to your point. FORTH was used in a time when the predominant mode of coding was to construct more complex programs from less complex programs, so application developers usually got their hands dirty with some lower level aspects. That is... the application programmers chose the lower level abstractions they wanted to use. Now that we're beyond that and have, as an industry, decided that van Rossom and Latner are the only people who are allowed to define low-level abstractions, it's a lot harder to do that.
(Again, file this one under "old man yells at cloud.")
[2nd edit]
Maybe it's best to think of FORTH as a DSL construction kit. Lisp is kinda-sorta the same way. As you point out, it's super easy for someone to develop DSLs that require coders to understand not only their application domain, but also how the language and it's underlying hardware abstractions operate. And when you're using someone else's DSL, you have to understand how they thought about the problem domain. And we stopped teaching how to analyze that in the 80s.
Yes, for bringup it is a very nice tool. Just enough, not too much. I used it for a 68K board in exactly that role.